Memory efficient std::map alternative

9.1k views Asked by At

I'm using a std::map to store about 20 million entries. If they were stored without any container overhead, it would take approximately 650MB of memory. However, since they are stored using std::map, it uses up about 15GB of memory (i.e. too much).

The reason I am using an std::map is because I need to find keys that are equal to/larger/smaller than x. This is why something like sparsehash wouldn't work (since, using that, I cannot find keys by comparison).

Is there an alternative to using std::map (or ordered maps in general) that would result in less memory usage?

EDIT: Writing performance is much more important than reading performance. It will probably only read ~10 entries, but I don't know which entries it will read.

4

There are 4 answers

0
MiJyn On BEST ANSWER

Turns out the issue wasn't std::map.

I realized was using 3 separate maps to represent various parts of the same data, and after slimming it down to 1, the difference in memory was entirely negligible.

Looking at the code a little more, I realized code I had written to free a really expensive struct (per element of the map) didn't actually work.

Fixing that part, it now uses <1GB of memory, as it should! :)


TL;DR: std::map's overhead is entirely negligible for this. The issue was my own.

0
EmDroid On

Are you writing on-the-fly or one time before the lookup is done? If the later is the case, you shouldn't need a map, you could use std::vector and one-time sort.

You could just insert everything unsorted to the vector, sort one-time after everything is there (O(N * log N) as well as std::map, but much better performance characteristics) and then lookup in the sorted array (O(logN) as the std::map).

And especially if you know the number of elements before reading and could reserve the vector size upfront, that could work pretty well. Or at least if you know some "upper bound" to reserve perhaps slightly more than actually needed but avoid the reallocations.

1
Angew is no longer proud of SO On

One alternative would be to use flat_map from Boost.Containers: that supports the same interface as std::map, but is backed by a sorted contiguous array (think std::vector) instead of a tree. Or hand-roll your own solution based on the same idea.

Its performance characteristic is of course different, due to the different back-end. It's up to you to evaluate whether it's usable in your case.

1
Bathsheba On

Given your requirements:

  1. Insertion needs to be quick
  2. There are many elements to read
  3. Read-back can be slow
  4. You only read back data once

I'd consider typedef std::pair<uint64, thirty_six_byte_struct> element; and populate a std::list<element>. That will be hard to beat in terms of performance.

For reading back, I'd simply traverse the linked list, checking at every point if you need one of those elements. That's a O(N) traversal but as you say, you'll only do that once.