Can I store a dict with bitstring values in memory without padding?

259 views Asked by At

I have a dict from some hash key to a bitstring. The bitstring can be variable length, but are generally < 160 bits and usually <80. I have about 80M key value pairs.

How can I store this data structure in as little memory as possible? I don't want to pad the bitstrings, or I will lose quite a bit of space (no pun intended).

I assume that I will have to store a byte at the beginning giving the length of the bit string. That's okay.

What is the most memory-efficient way to store this dict in memory?

I would prefer to use Python, but am open to other choices.

1

There are 1 answers

0
Vlad On BEST ANSWER

If you're referring to padding the bitstrings to an integral number of bytes, then you could store the concatenation of all the initial bitstrings in a single bitstring, and keep a dictionary whose values are tuples of the form (bit position, length).

Problem is, if I did the math right, then the length of this larger bitstring can be over 12 billion bits, so the bit position would be need a couple of bits over an int. Then if you need to pad the bit position itself we're back to square one.

If, however, the number of distinct lengths is < 64, you could fit the length field in 6 bits, and you end up with a dictionary mapping hash keys to 5-byte tuples indexing into the single bitstring. Would this work for you?