pandas group by year, id and do some statistics per id? with multiindex

Question

pandas group by year, id and do some statistics per id? with multiindex

653 views Asked by user931363 At 04 January 2017 at 15:42

I have a dataframe that is the child of a merge operation on two dataframes. I end up with a multi-index that looks like (timestamp,id) and for the sake of argument, a single column X.

I would like to do several statistics on X by year, and by ID. Instead of posting all the crazy errors that I am getting trying to blindly solve this problem, I ask instead "how would you do this?"

There is one row of X per id, per period (daily). I want to aggregate to an annual period.

Original Q&A

There are 2 answers

**jezrael** · Answer 1 · 2017-01-04T15:48:44+00:00

I think you can use groupby with resample and aggregate e.g. sum, but need pandas 0.18.1:

start = pd.to_datetime('2016-12-28')
rng = pd.date_range(start, periods=10)
df = pd.DataFrame({'timestamp': rng, 'X': range(10), 
                   'id': ['a'] * 3 + ['b'] * 3 + ['c'] * 4 })  
df = df.set_index(['timestamp','id'])
print (df)
               X
timestamp  id   
2016-12-28 a   0
2016-12-29 a   1
2016-12-30 a   2
2016-12-31 b   3
2017-01-01 b   4
2017-01-02 b   5
2017-01-03 c   6
2017-01-04 c   7
2017-01-05 c   8
2017-01-06 c   9

df = df.reset_index(level='id')
print (df.groupby('id').resample('A')['X'].sum())
id  timestamp 
a   2016-12-31     3
b   2016-12-31     3
    2017-12-31     9
c   2017-12-31    30
Name: X, dtype: int32

Another solution is use get_level_values with groupby:

print (df.X.groupby([df.index.get_level_values('timestamp').year,
                     df.index.get_level_values('id')])
           .sum())
      id
2016  a      3
      b      3
2017  b      9
      c     30
Name: X, dtype: int32

**Ted Petrou** · Answer 2 · 2017-01-04T15:54:51+00:00

Ted Petrou On 04 January 2017 at 15:54

If you want to ensure that the groups happen together then you must place all the groups in the groupby.

Assuming your timestamp is in the left outer group, the following should work.

df.groupby([pd.TimeGrouper('A', level=0), pd.Grouper(level='id')])['X'].sum()

TechQA.

pandas group by year, id and do some statistics per id? with multiindex

There are 2 answers

Related Questions in PYTHON

Related Questions in PANDAS

Related Questions in PANDAS-GROUPBY

Popular Questions

Popular Tags

Trending Questions