I have a rather large dictionary with about 40 million keys which I naively stored just by writing {key: value, key: value, ...}
into a text file. I didn't consider the fact that I could never realistically access this data because python has an aversion to loading and evaluating a 1.44GB text file as a dictionary.
I know I could use something like shelve
to be able to access the data without reading all of it at once, but I'm not sure how I would even convert this text file to a shelve file without regenerating all the data (which I would prefer not to do). Are there any better alternatives for storing, accessing, and potentially later changing this much data? If not, how should I go about converting this monstrosity over to a format usable by shelve?
If it matters, the dictionary is of the form {(int, int, int int): [[int, int], Bool]}
https://github.com/dagnelies/pysos
https://github.com/dagnelies/pysos
It works like a normal python
dict
, but has the advantage that it's much more efficient thanshelve
on windows and is also cross-platform, unlikeshelve
where the data storage differs based on the OS.To install:
Usage:
Just to give a ballpark figure, here is a mini benchmark (on my windows laptop):
The resulting file was about 3.5 Mb big.
So, in your case, if a million key/value pairs take roughly 1 minute to insert ...it would take you almost an hour to insert it all. Of course, the machine's specs can influence that a lot. It's just a very rough estimate.