I'm trying to groupby a large data set using chunking.
What works:
chunks = pd.read_stata('data.dta', chunksize = 50000, columns = ['year', 'race', 'app'])
pieces = [chunk.groupby(['race'])['app'].agg(['sum']) for chunk in chunks]
agg = pd.concat(pieces.groupby(level = 0).sum()
What doesn't work (error: Categorical objects has no attribute flags
)
chunks = pd.read_stata('data.dta', chunksize = 50000, columns = ['year', 'race', 'app'])
pieces = [chunk.groupby(['year', 'race'])['app'].agg(['sum']) for chunk in chunks]
agg = pd.concat(pieces.groupby(['year', 'race']).sum()
Thoughts on what i'm missing when adding in year
?
pieces
:
2013 Asian 9325
Black 2655
AmInd 118
Hisp 6371
White 16825
Other 2446
Unknown 3502
Foreign 7280
Name: app, dtype: float64, year race
2013 Asian 8884
Black 2969
AmInd 72
Hisp 3760
White 18926
Other 1843
Unknown 3262
Foreign 8183
Name: app, dtype: float64, year race
2013 Asian 6429
Black 2176
AmInd 89
Hisp 3804
White 13903
Other 1752
Unknown 2760
Foreign 6825
2014 Asian 1522
Black 738
AmInd 23
Hisp 1133
White 4243
Other 437
Unknown 316
Foreign 1997
Name: app, dtype: float64, year race