I'm using the pairing heap implementation found here: https://github.com/jemalloc/jemalloc/commits/dev/include/jemalloc/internal/ph.h
Once in a while though I need to iterate over the N min values in the heap (where N is bound by the number of elements in the heap but usually less). Is there any efficient way of doing this?
My current approach is calling remove_first()
N times and then pushing all the popped elements back via insert()
.
Your approach is
O(k log n)
, wherek
is the number of items you want to print andn
is the number of elements in the heap. It looks like these operations have been optimized quite extensively in the implementation you're using. There is a way to traverse the heap with another heap to solve your goal inO(k log k)
instead, which is faster with the log factor onk
instead ofn
. The approach is fairly simple: Maintain a min-heap of values (with pointers to the nodes in the tree) initialized to the root (which is the minimum value in the heap). Then, you can pop off the auxiliary heap and insert the current node's children in the main heap, which is faster since only the nodes which could possibly be the next smallest value are in the queue (which are the neighbors of the nodes you've taken so far).While the big-O complexity of this approach is technically better, it will almost certainly be much slower in practice. The implementation of this min-pairing heap seems very efficient, and it would almost certainly outweigh the overhead of creating an auxiliary heap and performing this search. Not to mention the extra code complexity with the possibility of introducing bugs, it's probably not worth it.
I'm pretty sure that you can't do better than
O(k log k)
. My intuition for this is that there's probably a constant time reduction to sorting if you could do better, but comparison-based sorting (iirc) has been proven to be impossible to solve in faster thanO(n log n)
. This intuition could be wrong, but I think it's probably pretty close to the truth. Best of luck!