Amortised complexity of an operation on std::map data-structure

74 views Asked by At

Suppose I have Q operations of following kind on a map storing key, value pairs. keys are comparable via operator '<'

Given l, r and a value x. Erase all the keys already present in the range [l, r) and insert pair (r,x).

My question is, Is the naive strategy of just erasing keys in the range and then inserting has a good ammortized complexity (may be O(log n) or something). As I have an intuitive feeling that if an operation does a lot of deletion, future operations will have less time complexity because of reduced size. Any ideas are appreciated.

More detailed description of "naive" algorithm : you simply binary search to find smallest keyl >= l and largest keyr < r and delete all the keys in the range [keyl, keyr] (overall this is O(log n + n log n) worst case)

2

There are 2 answers

0
btilly On

Your described algorithm does not achieve the complexity that you want.

The problem is that when you erase a range and then insert new data, you can wind up with more or less data than you started. If this happens in the middle of a vector, up to half of your data needs to be moved.

Moving big blocks of data is one of the fastest things that you can do to it. But moving O(n) data still takes O(n) time - just with good constants.

We have many better solutions. Data structures that solve it fully include red-black trees, btrees, skiplists, and leveldb. There are many subtle tradeoffs. But all of them will be amortized time O(log(n)) for this operation.

0
Gene On

Naive single erase and insert operations will have O(k log n) run time where n is the initial size of the tree and k is the number of elements in the range. Doing more than one of these doesn't change that cost. The values of n can certainly get smaller for each successive one (when k > 1), but not an a way that admits an amortized time bound. If you start with a tree of size N and do Q operations that take the tree down to size M, then you've done N-M+Q-1 deletions and K insertions. Period.

Assuming, as the std:map docs say, that red-black trees are used internally, searching for the first and last elements of the range and deleting elements between will require only O(k + log n) time. BUT, it will result in a collection of disjoint trees. These must be merged and re-balanced - with red/black flags updated - to create a new, valid tree. I don't know how efficiently this can be done, but a guess is O((n-k) log (n-k)). I have no idea if any of the C++ lib implementations do it. The benefit doesn't seem worthwhile even if my guess at a time bound is high.