Processing data on disk with a Pandas DataFrame

Question

Processing data on disk with a Pandas DataFrame

910 views Asked by user1994584 At 13 June 2015 at 14:55

Is there a way to take a very large amount of data on disk (a few 100 GB) and interact with it on disk as a pandas dataframe?

Here's what I've done so far:

Described the data using pytables and this example: http://www.pytables.org/usersguide/introduction.html
Run a test by loading a portion of the data (a few GB) into an HDF5 file
Converted the data into a dataframe using pd.DataFrame.from_records()

This last step loads all the data in memory.

I've looked for some way to describe the data as a pandas dataframe in step 1 but haven't been able to find a good set of instructions to do that. Is what I want to do feasible?

Original Q&A

There are 1 answers

**Jeff** · Answer 1 · 2015-06-13T15:17:50+00:00

Jeff On 13 June 2015 at 15:17

blaze is a nice way to interact with out-of-core data by using lazy expression evaluation. This uses pandas and PyTables under the hood (as well as a host of conversions with odo)

TechQA.

Processing data on disk with a Pandas DataFrame

There are 1 answers

Related Questions in PANDAS

Related Questions in HDF5

Related Questions in PYTABLES

Popular Questions

Popular Tags

Trending Questions