Efficient persistent storage for lists in Python

1k views Asked by At

I have a (key, value) map where for each key I have a somewhat large list of heterogeneous lists (~max about 250 items). Each list is a mix of strings and numbers that I might want to iterate over. The key is a string. If I wanted to store such a list with thousands of such (key, value) pairs persistently for efficient retrieval what are the best options? If I use sqlite then I would need to create a table for each key and then map the lists to individual records in the database. Are there better and efficient options if the goal is fast retrieval of the list of lists for a particular key? Here is a short example. Say animals is a map of keys to list of lists. Sample data looks like this:

 animals = { 
    "Lion" : [["Siberian", 203, "Tanzania", 123.56], ["Russian", 321, "Timbktu", 23423.2]],
    "Tiger: [["White", 121, "Australia", 1211.1], ["Indian", 111, "India", 1241.5]]
  }

So I want to be able to persist this data structure and be able to quickly index by the name of an animal (always unique) and get the list of lists for the particular animal I care about. If the lists within each animal's info is of fixed length and fixed fields, can I exploit that feature somehow to improve efficiency?

3

There are 3 answers

7
Andrzej Pronobis On

I would suggest one of the fast JSON libraries. There are several speed comparisons online that suggest that JSON can be as fast or rather faster than pickle. Check this one for example: http://lvsl.github.io/2011/12/28/python-serialization-benchmark.html and https://blog.hartleybrody.com/python-serialize/

There are several JSON serialization alternatives, and again, there are some comparisons online, e.g. https://medium.com/@jyotiska/json-vs-simplejson-vs-ujson-a115a63a9e26

I would suggest looking into ujson, which seems to be really fast and has one big advantage over e.g. pickle, it's very easy to inspect the data as they are saved in a human readable format. On the other hand pickle will be a bit easier to use with custom types, although you can still define custom encoders for custom types for JSON. Overall, choose JSON if you care more about human readability, and pickle if what really matters is having a few lines of code less for custom types.

0
Ami Tavory On

As Blender states in the comment, pickle is a reasonable choice. Make sure not to use the original version, though, and instead use the C-based cPickle. Alternatively, consider dill.

0
Alexander On

Depending on your needs, you may want to consider REDIS which is an excellent key:value database solution. This tutorial provides a relatively quick introduction.