Pandas for Large Data Sets: Millions of records

411 views Asked by As3adTintin At 16 June 2015 at 20:31

I have a dataset in stata that is about 5.8 million rows(records).

I've been learning pandas the past few months and really enjoy its capabilities. Would pandas still work in this scenario?

I am having trouble reading the dataset into a dataframe. I'm currently looking at chunking... chunks = pd.read_stata('data.dta', chunksize = 100000, columns = ['year','race', 'app'])

Is there a better way to go about this? I am hoping to do something like:

df = pd.read_stata('data.dta')
data = df.groupby(['year', 'race']).agg(sum)
data.to_csv('data.csv')

but that does not work because (i think) the dataset is too large. error: OverflowError: Python int too large to convert to C long

Thanks. Cheers

Original Q&A

TechQA.

Pandas for Large Data Sets: Millions of records

There are 0 answers

Related Questions in PYTHON-2.7

Related Questions in PANDAS

Related Questions in CHUNKING

Popular Questions

Popular Tags

Trending Questions