Why is this priority queue failing to heapify? Where (150, 200, 200) are the priority values assigned to the dictionaries
import heapq
priority_q = [
(150, {'intel-labels': {'timestamp': 150}}),
(200, {'intel-labels': {'timestamp': 200}}),
(200, {'intel-labels': {'timestamp': 200, 'xx': 'xx'}})
]
heapq.heapify(priority_q)
print( heapq.nlargest(2, priority_q))
The exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'dict' and 'dict'
The below, however, works..
priority_q = [
(150, {'intel-labels': {'timestamp': 150}}),
(200, {'intel-labels': {'timestamp': 200}}),
(201, {'intel-labels': {'timestamp': 200, 'xx': 'xx'}})
]
heapq.heapify(priority_q)
Why is this?
heapq
heapifies the heap according to comparisons. Tuples are compared lexicographically. If your first values in all tuples are distinct, you will never compare the second values. This is the case for your second example. If you have a duplicate value (200
in your first example), the second elements will be compared. These are dicts (which can't be compared), so this raises an error.As for a proper fix: The
heapq
docs got you covered. Their suggestion is to use triples such that the second value is an autoincremented tie breaker, or to use a data class that only compares the priority field.Note that these approaches differ slightly: With a tie breaker, there won't be any ties;
nlargest
will always give you just a single element - the one which was inserted first. If you want it to return all tying elements, you shouldn't use this approach.Applying the second approach to your example, the following works:
prints