How to calculate difference for every unique column value based on the next date in dataframe for that unique column value?

155 views Asked by At

I have a df like:

date       | prod_number | prod_count | prod_factor
2018-01-01 | 1           | 5          | 3
2018-02-01 | 1           | 20         | 3
2018-04-01 | 1           | 10         | 3
2019-09-01 | 2           | 8          | 5
2018-09-02 | 2           | 7          | 5
2018-10-03 | 2           | 10         | 5

For every "prod_number" I want to get the change from the last date and then this multiplied by the prod_factor:

The first entry for each "prod_number" has nothing to calculate the difference on so it's NONE or 0, whatever is easier.


date       | prod_number | prod_count | prod_factor | change      | prod_factor*change
2018-01-01 | 1           | 5          | 3           | NONE/0      | NONE/0
2018-02-01 | 1           | 20         | 3           | 15 # 20-5   | 45  # 3*15
2018-04-01 | 1           | 10         | 3           | -10 # 10-20 | -30 # 3*-10

2019-09-01 | 2           | 8          | 5           | NONE/0      | NONE/0
2018-09-02 | 2           | 7          | 5           | -1 # 7-8    | -5  # 5*-1
2018-10-03 | 2           | 10         | 5           | 3 # 10-7    | 15  # 5*3

How can I achieve this with pandas?


There are 2 answers


Use groupby.diff then multiply both columns:

df['change'] = df.groupby('prod_number')['prod_count'].diff()
df['prod_factor*change'] = df['change'] * df['prod_factor']

         date  prod_number  prod_count  prod_factor  change  prod_factor*change
0  2018-01-01            1           5            3     NaN                 NaN
1  2018-02-01            1          20            3    15.0                45.0
2  2018-04-01            1          10            3   -10.0               -30.0
3  2019-09-01            2           8            5     NaN                 NaN
4  2018-09-02            2           7            5    -1.0                -5.0
5  2018-10-03            2          10            5     3.0                15.0
Myrt On

You can use np.where and diff()

import pandas as pd
import numpy as np
df=pd.DataFrame([['2018 - 01 - 01',1,5,3],['2018 - 02 - 01',1,20,3],['2018 - 04 - 01',1,10,3],['2019 - 09 - 01',2,8,5],['2018 - 09 - 02',2,7,5],['2018 - 10 - 03',2,10,5]  ],
    df['prod_number'].diff() == 0, #cond to check if  prod_number is the same
    df['prod_count'].diff(), #value if true
  0  #else we 0
                 date  prod_number  prod_count  prod_factor  change
0  2018 - 01 - 01            1           5            3     0.0
1  2018 - 02 - 01            1          20            3    15.0
2  2018 - 04 - 01            1          10            3   -10.0
3  2019 - 09 - 01            2           8            5     0.0
4  2018 - 09 - 02            2           7            5    -1.0
5  2018 - 10 - 03            2          10            5     3.0