Why is this python priority queue failing to heapify?

105 views Asked by At

Why is this priority queue failing to heapify? Where (150, 200, 200) are the priority values assigned to the dictionaries

import heapq

priority_q = [
    (150, {'intel-labels': {'timestamp': 150}}), 
    (200, {'intel-labels': {'timestamp': 200}}), 
    (200, {'intel-labels': {'timestamp': 200, 'xx': 'xx'}})
]

heapq.heapify(priority_q)

print( heapq.nlargest(2, priority_q))

The exception:

Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   TypeError: '<' not supported between instances of 'dict' and 'dict'

The below, however, works..

priority_q = [
    (150, {'intel-labels': {'timestamp': 150}}), 
    (200, {'intel-labels': {'timestamp': 200}}), 
    (201, {'intel-labels': {'timestamp': 200, 'xx': 'xx'}})
]

heapq.heapify(priority_q)

Why is this?

3

There are 3 answers

0
Luatic On BEST ANSWER

Why is this?

heapq heapifies the heap according to comparisons. Tuples are compared lexicographically. If your first values in all tuples are distinct, you will never compare the second values. This is the case for your second example. If you have a duplicate value (200 in your first example), the second elements will be compared. These are dicts (which can't be compared), so this raises an error.

As for a proper fix: The heapq docs got you covered. Their suggestion is to use triples such that the second value is an autoincremented tie breaker, or to use a data class that only compares the priority field.

Note that these approaches differ slightly: With a tie breaker, there won't be any ties; nlargest will always give you just a single element - the one which was inserted first. If you want it to return all tying elements, you shouldn't use this approach.

Applying the second approach to your example, the following works:

import heapq
from dataclasses import dataclass, field
from typing import Any

@dataclass(order=True)
class PrioritizedItem:
    priority: int
    item: Any=field(compare=False)

priority_q = [
    PrioritizedItem(150, {'intel-labels': {'timestamp': 150}}), 
    PrioritizedItem(200, {'intel-labels': {'timestamp': 200}}), 
    PrioritizedItem(200, {'intel-labels': {'timestamp': 200, 'xx': 'xx'}})
]
heapq.heapify(priority_q)
print(heapq.nlargest(2, priority_q))

prints

[PrioritizedItem(priority=200, item={'intel-labels': {'timestamp': 200}}), PrioritizedItem(priority=200, item={'intel-labels': {'timestamp': 200, 'xx': 'xx'}})]
0
matszwecja On

Python doesn't know how to compare dicts, so it can't solve the problem if there is a tie in first value.

If there is no tie on the first value, the code works fine:

import heapq


print(heapq.nlargest(2, [(200, {"a"}), (300, {"b"}), (200, {"c"})]))



priority_q = [
    (150, {'intel-labels': {'timestamp': 150}}), 
    (200, {'intel-labels': {'timestamp': 200}}), 
    (300, {'intel-labels': {'timestamp': 200, 'xx': 'xx'}})
]

heapq.heapify(priority_q)

print( heapq.nlargest(2, priority_q))
0
S.B On

heap.heapify is comparing the items. Your items are tuple.

So you need to know how tuples are compared. It's element wise, Python starts from the first item in tuples and compare them, if it could conclude the result, it stops there and returns the result, otherwise(the first items are equal) it goes to check the second items in both tuples and so on.

Take a look at this:

t1 = (8, {"a": 10})
t2 = (10, {"b": 20})

print(t1 < t2)  # True

It's true because 8 is less than 10 and that's it. No further inspection is needed.

How about this one?

t1 = (10, {"a": 10})
t2 = (10, {"b": 20})

print(t1 < t2)

output:

Traceback (most recent call last):
  File "", line 4, in <module>
    print(t1 < t2)
          ^^^^^^^
TypeError: '<' not supported between instances of 'dict' and 'dict'

It's now an error. dict objects can not be used as the operands of < or >.

>>> {"a": 10} < {"b": 20}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '<' not supported between instances of 'dict' and 'dict'

Solution:
Unfortunately you cannot pass a sorting function as the key parameter like sorted() to specify how you would want this comparison to be, but, you can wrap the items inside another custom object and implement the __lt__ based on the first item of the tuple. You can manually do that or use a @dataclass. The documentation mentioned this in particular:

https://docs.python.org/3/library/heapq.html#priority-queue-implementation-notes

Here is a simple custom wrapper:

import heapq
from typing import Any


class PriorityItem:
    __slots__ = ("priority", "count", "item")
    counter = 0

    def __init__(self, priority: int, item: Any) -> None:
        self.priority = priority
        self.item = item
        self.count = PriorityItem.counter
        PriorityItem.counter += 1

    def __lt__(self, item: object) -> bool:
        if not isinstance(item, PriorityItem):
            return NotImplemented
        return (self.priority, self.count) < (item.priority, item.count)

    def __repr__(self) -> str:
        return f"PriorityItem(priority={self.priority}, item={self.item})"


priority_q = [
    (150, {"intel-labels": {"timestamp": 150}}),
    (200, {"intel-labels": {"timestamp": 200}}),
    (200, {"intel-labels": {"timestamp": 200, "xx": "xx"}}),
]

new_priority_q = [PriorityItem(*i) for i in priority_q]
heapq.heapify(new_priority_q)
print(heapq.nlargest(2, new_priority_q))