A*, what's the best data structure for the open list?

3.1k views Asked by At

Disclaimer: I really believe that this is not a duplicate of similar questions. I've read those, and they (mostly) recommend using a heap or a priority queue. My question is more of the "I don't understand how those would work in this case" kind.

In short:

I'm referring to the typical A* (A-star) pathfinding algorithm, as described (for example) on Wikipedia:

https://en.wikipedia.org/wiki/A*_search_algorithm

More specifically, I'm wondering about what's the best data structure (which can be a single well known data structure, or a combination of those) to use so that you never have O(n) performance on any of the operations that the algorithm requires to do on the open list.

As far as I understand (mostly from the Wikipedia article), the operations needed to be done on the open list are as follows:

The elements in this list need to be Node instances with the following properties:

  • position (or coordinates). For the sake of argument, let's say this is a positive integer ranging in value from 0 to 64516 (I'm limiting my A* area size to 254x254, which means that any set of coordinates can be bit-encoded on 16 bits)
  • F score. This is positive floating point value.

Given these, the operations are:

  • Add a node to the open list: if a node with the same position (coordinates) exists (but, potentially, with a different F score), replace it.
  • Retrieve (and remove) from the open list the node with the lowest F score
  • (Check if exists and) retrieve from the list a node for a given position (coordinates)

As far as I can see, the problem with using a Heap or Priority Queue for the open list are:

  • These data structure will use the F-score as sorting criteria
  • As such, adding a node to this kind of data structure is problematic: how do you check optimally that a node with a similar set of coordinates (but a different F Score) doesn't already exist. Furthermore, even if you somehow are able to do this check, if you actually find such a node, but it is not on the top of the Heap/Queue, how to you optimally remove it such that the Heap/Queue keeps its correct order
  • Also, checking for existence and removing a node based on its position is not optimal or even possible: if we use a Priority Queue, we have to check every node in it, and remove the corresponding one if found. For a heap, if such a removal is necessary, I imagine that all remaining elements need to be extracted and re-inserted, so that the heap still remains a heap.
  • The only remaining operating where such a data structure would be good is when we want to remove the node with the lowest F-score. In this case the operation would be O(Log(n)).

Also, if we make a custom data structure, such as one that uses a Hashtable (with position as key) and a Priority Queue, we would still have some operations that require suboptimal processing on either of these: In order to keep them in sync (both should have the same nodes in them), for a given operation, that operation will always be subomtimal on one of the data structures: adding or removing a node by position would be fast on the Hashtable but slow on the Priority Queue. Removing the node with the lowest F score would be fast on the Priority Queue but slow on the Hashtable.

What I've done is make a custom Hashtable for the nodes that uses their position as key, that also keeps track of the current node with the lowest F score. When adding a new node, it checks if its F score is lower than the currently stored lowest F score node, and if so, it replaces it. The problem with this data structure comes when you want to remove a node (whether by position or the lowest F scored one). In this case, in order to update the field holding the current lowest F score node, I need to iterate through all the remaining node in order to find which one has the lowest F score now.

So my question is: is there a better way to store these ?

1

There are 1 answers

3
harold On BEST ANSWER

You can combine the hash table and the heap without slow operations showing up.

Have the hash table map position to index in the heap instead of node.

Any update to the heap can sync itself (which requires the heap to know about the hash table, so this is invasive and not just a wrapper around two off-the-shelf implementations) to the hash table with as many updates (each O(1), obviously) as the number of items that move in the heap, of course only log n items can move for an insertion, remove-min or update-key. The hash table finds the node (in the heap) to update the key of for the parent-updating/G-changing step of A* so that's fast too.