New row which is the difference of two rows using Groupby in a pandas dataframe

333 views Asked by At

I have a dataframe

import pandas as pd
df = pd.DataFrame({ 
'ID': [1, 1, 1, 2, 2, 2], 
'value': [100, 120, 130, 200, 190, 210],
'value2': [2100, 2120, 2130, 2200, 2190, 2210],   
'state': ['init','mid', 'final', 'init', 'mid', 'final'], 
})

I want to add another row, which is the difference of 'final' and 'init' in the state column for each ID.

The result should look like

import pandas as pd
df = pd.DataFrame({ 
    'ID': [1, 1, 1, 2, 2, 2, 1, 2], 
    'value': [100, 120, 130, 200, 190, 210, 30, 10],
    'value2': [2100, 2120, 2130, 2200, 2190, 2210, 100, 10],   
     'state': ['init','mid', 'final', 'init', 'mid', 'final', 'diff', 'diff'], 
 })

I have tried the following. But I get AttributeError: 'function' object has no attribute 'groupby' error

df1 = df.copy()

df1 = df[df.state.isin(['final', 'init'])].copy
s = df1.groupby('ID', sort=False).diff().dropna.assign(id=df1['ID'].unique(), state='diff')
df = df.append(s, sort=True).sort_values("ID")
2

There are 2 answers

0
tdy On BEST ANSWER

What about something like this? Group by ID, keep only the value* cols, take the diff, and reassign ID and state:

df
ID value value2 state
0 1 100 2100 init
1 1 120 2120 mid
2 1 130 2130 final
3 2 200 2200 init
4 2 190 2190 mid
5 2 210 2210 final
diffs = (df[df.state.isin(['init', 'final'])]
    .groupby('ID')[['value', 'value2']]
    .diff().dropna()
    .assign(ID=df.ID.unique(), state='diff')
)
ID value value2 state
0 1 30.0 30.0 diff
1 2 10.0 10.0 diff

Then combine back into df:

df = df.append(diffs)
ID value value2 state
0 1 100.0 2100.0 init
1 1 120.0 2120.0 mid
2 1 130.0 2130.0 final
3 2 200.0 2200.0 init
4 2 190.0 2190.0 mid
5 2 210.0 2210.0 final
0 1 30.0 30.0 diff
1 2 10.0 10.0 diff
2
Miguel Trejo On

Another option is to take the difference betweent the final state and the init as a matrix operation and then fill the missing columns ID and state passing a dictionary of values.

val_cols = ['value', 'value2']
df_final = df.query('state == "final"')[val_cols].reset_index(drop=True) - df.query('state== "init"')[val_cols].reset_index(drop=True)

df_final = pd.concat([df, df_final])

values = {'ID': {i:i+1 for i, v in enumerate(df.ID.unique())}, 'state': 'diff'}
df_final.fillna(values, inplace=True)

Final Result

enter image description here