Apparent error in Python’s priority queue and heapq

After some time spent trying to find a rumored error in Python’s priority queue (Queue.PriorityQueue), I have succeeded in writing a little program that can reproduce this error.

Code is available at https://bitbucket.org/dpb/queue_change_test . It is intended for Python v. 2.6. As I found recently, the problem does not normally arise when simple data structures are placed in the queue and can be sorted in lexicographic order; when stable sort is at issue, however, the queue sometimes fails.

The error occurs in this code when lists are enqueued and some of their values are then changed outside of the queue. The queue should automatically reorder itself, but this does not always happen correctly.

If, after the change to the values, the queue is emptied into another list and those contents then re-enqueued, the new queue is found to be sorted correctly. That shows that the problem has something to do with the queue data structure itself and not the values in the queue. The queue adds values correctly but does not always sort them correctly when they are changed after being added.

The pseudo-random numbers in the program use a seed, to allow the error to be reproduced.

The user has two options:

  • to enter other seeds (default is none, in which case Python attempts to make the results look random by seeding with system time);

  • to re-enqueue the contents, as described above, in order to eliminate sorting errors (default is not to re-enqueue).

The code is commented but as of this writing there is not yet a documentation file.


Update (same night): I have added a second small program, heapq_change_test.py, which performs the same test using the heapq module. Results are the same. No surprise; Queue relies on heapq‘s heappush and heappop for its put and get methods (see the source code at http://hg.python.org/cpython/file/2.7/Lib/Queue.py). The results are the same because the same methods are being called.

Lack of stable sort in Python’s priority queue

I wrote earlier about modeling Python’s priority queue using the “list” data type, in order to find a rumored source of error in implementing the Dijkstra shortest-path algorithm.

I see now that there is an important feature, possible in the list model but apparently not in the priority queue: stable sorting. I’ve altered my queue_test code to illustrate this difference. I am dealing here with v. 2.6 of Python.

When a list of lists is sorted under normal conditions, all items of the sub-lists take part in the sorting: their first item is sorted, then their second item, then their third, and so on. The result looks like this:
[6, 17, 18, 16]
[6, 17, 18, 17]
[6, 17, 19, 6]
[6, 17, 20, 7]
[6, 17, 20, 16]
[6, 18, 1, 5]
[6, 18, 1, 13]
[6, 18, 2, 2]
[6, 18, 2, 5]
[6, 18, 2, 12]
[6, 18, 3, 2]
[6, 18, 3, 6]
[6, 18, 3, 6]
[6, 18, 4, 19]
[6, 18, 4, 20]

All these lists begin with 6, but we cannot tell anything about their original order relative to each other. Those whose second element is 17 all come before those whose second element is 18. Among those beginning [6, 18], those whose third element is 1 all come before those whose third element is 2. And so forth. The sorting process is carried out on all elements, leaving no trace of the original order of the lists.

In a stable sort, however, only some specified item is sorted, and the other items are not examined, so a remnant of the earlier order persists in the sorted result. The result might look like this:
[6, 17, 18, 16]
[6, 9, 20, 4]
[6, 9, 19, 6]
[6, 20, 18, 11]
[6, 20, 6, 12]
[6, 19, 1, 12]
[6, 17, 4, 8]
[6, 12, 9, 20]

You can see that [6, 17, 18, 16] “ought” to be placed after [6, 9, 20, 4], and so forth, but stable sorting has left them out of order, reflecting their original relative position.

Stable sort in Python 2.6 is achieved by use of a key as an argument of the sort() method, and can be refined by use of the itemgetter() method of the operator module. But Python’s priority queue does not seem able to sort by a key or to sort stably. For an illustration, see the current version of my queue_test code, online at https://bitbucket.org/dpb/queue_test.

Testing the reliability of the Python priority queue

Hearing of a problem with the PriorityQueue class in Python’s Queue module, I wrote this short program to test the class. https://bitbucket.org/dpb/queue_test

The Python priority queue is modeled here by a list, which is sorted so as to match the queue’s ascending order. The queue data structure differs from a list in having its size fixed at creation time, making possible a full() method, which lists do not have. Queues also differ from lists in that they can be blocked or have a timeout value applied to their methods, since they are intended for use with threads. Otherwise, lists have a much larger inventory of functionality.

In the this program, the two objects created are filled with random numbers, using both put() and get() so that the queue contracts as well as expands, and then subjected to four tests:

  1. The queue should be full (meaning that full() returns True) after being created and populated.
  2. The sizes of the objects are compared (using qsize() for the queue and len() for the list).
  3. The objects are compared item by item (using get() for the queue and indexing for the list).
  4. After all the values in the queue have been dequeued, it should in fact be empty, meaning that empty() returns True.

This test is repeated in an endless loop which can be terminated with ^C, after which the program reports the number of item-comparisons and errors found. Failure of either tests 1 or 4, however, will terminate the loop.

The user is offered the chance to supply the size of the queue and list, as well as the upper value of the range of random numbers (the lower value is 1). Default values are 10000 for both variables if the user does not enter values.

Note that blocking and timeout is not tested here. The put_nowait() and get_nowait() methods were tested manually but then removed from the code.

Conclusion: No problems have been found in Queue.PriorityQueue as of this writing.


Update, 20111130: I’ve more recently discussed a more interesting conclusion about this problem: see https://brannerchinese.wordpress.com/2011/11/28/lack-of-stable-sort-in-pythons-priority-queue/ and https://brannerchinese.wordpress.com/2011/11/30/apparent-error-in-pythons-priority-queue/.