Pandas multilevel concat/group/chunking

207 views Asked by At

I'm trying to groupby a large data set using chunking.

What works:

chunks = pd.read_stata('data.dta', chunksize = 50000, columns = ['year', 'race', 'app'])
pieces = [chunk.groupby(['race'])['app'].agg(['sum']) for chunk in chunks]
agg = pd.concat(pieces.groupby(level = 0).sum()

What doesn't work (error: Categorical objects has no attribute flags)

chunks = pd.read_stata('data.dta', chunksize = 50000, columns = ['year', 'race', 'app'])
pieces = [chunk.groupby(['year', 'race'])['app'].agg(['sum']) for chunk in chunks]
agg = pd.concat(pieces.groupby(['year', 'race']).sum()

Thoughts on what i'm missing when adding in year?

pieces:

 2013  Asian       9325
   Black       2655
   AmInd        118
   Hisp        6371
   White      16825
   Other       2446
   Unknown     3502
   Foreign     7280
 Name: app, dtype: float64, year  race   
 2013  Asian       8884
   Black       2969
   AmInd         72
   Hisp        3760
   White      18926
   Other       1843
   Unknown     3262
   Foreign     8183
 Name: app, dtype: float64, year  race   
 2013  Asian       6429
   Black       2176
   AmInd         89
   Hisp        3804
   White      13903
   Other       1752
   Unknown     2760
   Foreign     6825
 2014  Asian       1522
   Black        738
   AmInd         23
   Hisp        1133
   White       4243
   Other        437
   Unknown      316
   Foreign     1997
 Name: app, dtype: float64, year  race   
0

There are 0 answers