How can I use the Python io module to build a memory-resident data structure?

943 views Asked by At

I'm trying to write data collected from a data acquisition system to locations in memory, and then asynchronously perform further processing on the data, or write it out to file for offline processing. I'm trying to do this architecture this way to isolate data acquisition from data analysis and transmittal, buying us some flexibility for future expansion and improvement, but it is definitely more complex then simply writing the data directly to a file.

Here is some exploratory code I wrote.

#io.BufferedRWPair test
from io import BufferedRWPair

# Samples of instrumentation data to be stored in RAM
test0 = {'Wed Aug  1 16:48:51 2012': ['20.0000', '0.0000', '13.5', '75.62', '8190',
                    '1640', '240', '-13', '79.40']}
test1 = {'Wed Aug  1 17:06:48 2012': ['20.0000', '0.0000', '13.5', '75.62', '8190',
             '1640', '240', '-13', '79.40']}

# Attempt to create a RAM-resident object into which to read the data.
data = BufferedRWPair(' ', ' ', buffer_size=1024)

data.write(test0)
data.write(test1)

print data.getvalue()

data.close()

There are a couple of issues here (maybe more!):

-> 'data' is a variable name that picks up a construct (outside of Python) that I'm trying to assemble -- which is an array-like structure that should hold sequential records with each record containing several process data measurements, prefaced by a timestamp that can serve as a key for retrieval. I offered this as background to my design intent, in case the code was too vague to reflect my true questions.

-> This code does not work, because the 'data' object is not being created. I'm just trying to open an empty buffer, to be filled later, but Python is looking for two objects, one readable, one writeable, which are not present in my code. Because of this, I'm not sure I'm even using the right construct, which leads to these questions:

  1. Is io.BufferedRWPair the best way to deal with this data? I've tried StringIO, since I'm on Python 2.7.2, but no luck. I like the idea of a record with a timestamp key, hence my choice of the dict structure, but I'd sure look at alternatives. Are there other io classes I should look at instead?

  2. One alternative I've looked at is the DataFrame construct which is defined in the NumPy/ SciPy/ Pandas world. It looks interesting, but there seems like a lot of additional modules required, so I've shied away from that. I have no experience with any of those modules -- Should I be looking at these more complex modules to get what I need?

I'd welcome any suggestions or feedback, folks... Thanks for checking out this question!

2

There are 2 answers

2
Bryan Oakley On BEST ANSWER

If I understand what you are asking, using an in-memory sqlite database might be the way to go. Sqlite allows you to create a fully functioning SQL database entirly in memory. Instead of reads and writes you would do selects and inserts.

3
Sven Marnach On

Writing a mechanism to hold data in memory while it fits and only write it to a file if necessary is redundant – the operating system does this for you anyway. If you use a normal file and access it from the different parts of your application, the operating system will keep the file contents in the disk cache as long as enough memory is available.

If you want to have access to the file by memory addresses, you can memory-map it using the mmap module. However, my impression is that all you need is a standard database, or one of the simpler alternatives offered by the Python standard library, such as the shelve any anydbm modules.

Based on your comments, also check out key-value stores like Redis and memcached.