How does heapq resolve equal values?

3.8k views Asked by At

Coming from Java, I am trying to implement A* algorithm in python and I'm having trouble sorting vertices in my graph that have equal f scores. I'm trying to do so using heapq, and after some debugging I noticed if I would like to push a vertex that has some f score equal to some other pre-existing vertices in the heap, the order would be messed up. I'm thinking of implementing my own priority queue now. I would like to know how this works.

An illustration of the behaviour is like so:

>>> mylist = [1, 2, 5, 4, 3]
>>> heapq.heapify(mylist)
>>> mylist
>>> [1, 2, 3, 4, 5]
>>> heapq.heappush(mylist, 1)
>>> mylist
>>> [1, 2, 1, 4, 5, 3]

Here is the actual code I'm implementing for context:

class Node(object):

def __init__(self, name, x_coordinate, y_coordinate, obstacle_flag=False):
    self.name = name  # possible values should only be ' ', 'A-Z', '*'
    self.coordinates = (x_coordinate, y_coordinate)  # this will uniquely identify the node
    self.obstacle = obstacle_flag  # if the name is '*' the obstacle is set to True
    self.neighbors = {}  # list of neighbors of this node
    self.set_obstacle()

...

class Vertex(Node):
def __init__(self, name, x_coordinate, y_coordinate, obstacle_flag):
    super(Vertex, self).__init__(name, x_coordinate, y_coordinate, obstacle_flag)
    self.g_actual_cost = 10000  
    self.h_cost = 0  # the cost given by the heuristic function
    self.previous_vertex = None
    self.total_cost = self.g_actual_cost + self.h_cost

def __lt__(self, other):
    return self.total_cost < other.total_cost

def __eq__(self, other):
    if isinstance(other, Vertex):
        return self.total_cost == other.total_cost
    return NotImplemented
2

There are 2 answers

0
ShadowRanger On BEST ANSWER

"Heap" doesn't mean "sorted" (if it did, you couldn't build it for arbitrary values in O(n) time). It means it satisfies the heap invariant, for a min-heap like Python's, this just means that the smallest value is at the top (if there's a tie, an arbitrary value wins), and that each node's children are always equal to or larger than the node. You find the children for a node at index i by looking at index 2*i+1 and 2*i+2, so in your example, putting P under each parent, and C under each of their children, we have:

[1, 2, 1, 4, 5, 3]
#P  C  C
#0  1  2

[1, 2, 1, 4, 5, 3]
#   P     C  C
#   1     3  4

[1, 2, 1, 4, 5, 3]
#      P        C
#      2        5  (only one child since heap lacks another element)

As you can see, in each case, the P value is less than or equal to all of its children; the heap-invariant is maintained, which is all that is necessary for heappop, heappush, etc. to continue to work.

Note that for objects like your Vertex class, where the comparison is based on one value, but the objects have other state, a heap isn't stable; two objects with the same total_cost could come out in either order, regardless of which was placed in the heap first. To avoid this problem, you have to add your own fallback comparison by "decorating" each value. A simple approach would be:

 from itertools import count

 original_list = [Vertex(...), Vertex(...), ...]  # Define initial list of vertices
 numbermaker = count()
 decorated_list = [(v, next(numbermaker)) for v in original_list]
 heapq.heapify(decorated_list)

 # When you want to add new elements:
 heapq.heappush(decorated_list, (Vertex(...), next(numbermaker)))

 # When you want to get the top of the heap's value:
 top = decorated_list[0][0]  # First [0] gets decorated top, second strips decoration

 # When you want to pop off the top:
 top = heapq.heappop(decorated_list)[0]
3
wim On

The order is not messed up. The heap should have for all indices:

a[i] <= a[2*i + 1]
a[i] <= a[2*i + 2]

That doesn't imply the array is sorted. In your case the heap invariant is still satisfied, and you can consume from it as a priority queue.

>>> heap = [1, 2, 1, 4, 5, 3]
>>> import heapq
>>> heapq.heappop(heap)
1
>>> heapq.heappop(heap)
1
>>> heapq.heappop(heap)
2
>>> heap
[3, 4, 5]