Best way to load a mmap dictionary in Python without deserializing

211 views Asked by At

Context:

I've got Python processes running on the same container and I want to be able to share a read-only key-value object between them.

I'm aware I could use something like Redis to share that info, but I'm looking for optimal solution in regards to latency as well as memory usage.

My idea was to generate a binary object on disk and open that file using mmap

Question: That brings me to my question, is there a binary format or library that would load a read-only file in ram an offer a dictionary interface, without the need to deserialize the file content? This way I could mmap the file in every process, all process would be re-using the same RAM for that file and I would be able to access the file's content with a dict-like interface?

I'm looking a file/object format for dictionaries, similar to what Parquet is to columnar storage, that could be consumed in read-only mode by python.

2

There are 2 answers

1
amirouche On BEST ANSWER

Since your data is read-only, and you do not need to iterate over keys in lexicographic order: use constant database cdb. They are bindings for python. You might want to look into an easy grab in the stdlib called dbm.

3
Asad Awadia On

If the size of the file is small - in every process slurp it into memory

Else use sqlite

My idea was to generate a binary object on disk and open that file using mmap

You should not be using mmap