How to forward fill by group and conditional on other columns in python

86 views Asked by At

The original dataset is:

Group Year Value
A 1990 NaN
A 1992 1
A 1995 NaN
A 1997 NaN
A 1998 NaN
A 2001 NaN
A 2002 1
B 1991 1
B 1992 NaN
B 1995 NaN
B 1998 NaN
B 2001 1
B 2002 NaN

I want to do forward fill by group and conditional on the value of column 'Year': forward fill missing value until the 'Year' is more than five years apart. For example, the value for group A in Year 1992 is 1, so the value for group A in 1995 should be forward filled with 1 since 1995-1992=3 <= 5; and the value for group A in 1997 should be forward filled with 1 since 1995-1992=3 <= 5; and the value for group A in 1998 should not be forward filled with 1 since 1998-1992=6 > 5.

The dataset I want is as follows:

Group Year Value
A 1990 NaN
A 1992 1
A 1995 1
A 1997 1
A 1998 NaN
A 2001 NaN
A 2002 1
B 1991 1
B 1992 1
B 1995 1
B 1998 NaN
B 2001 1
B 2002 1
2

There are 2 answers

0
mozway On

You can use a double groupby.ffill and mask with where:

# identify rows within 5 of the previous non-NA value
m = (df['Year'].where(df['Value'].notna())
     .groupby(df['Group']).ffill()
     .rsub(df['Year']).le(5)
    )

# groupby.ffill and mask
df['Value'] = df.groupby('Group')['Value'].ffill().where(m)

Output:

   Group  Year  Value
0      A  1990    NaN
1      A  1992    1.0
2      A  1995    1.0
3      A  1997    1.0
4      A  1998    NaN
5      A  2001    NaN
6      A  2002    1.0
7      B  1991    1.0
8      B  1992    1.0
9      B  1995    1.0
10     B  1998    NaN
11     B  2001    1.0
12     B  2002    1.0
0
Mykola Zotko On

You can use cummax to identify 5-year ranges:

x = df['Year'].mask(df['Value'].notna(), df['Year'] + 5).groupby(df['Group']).cummax()

df['Value'] = df.groupby([df['Group'], x])['Value'].ffill()

Result:

   Group  Year  Value
0      A  1990    NaN
1      A  1992    1.0
2      A  1995    1.0
3      A  1997    1.0
4      A  1998    NaN
5      A  2001    NaN
6      A  2002    1.0
7      B  1991    1.0
8      B  1992    1.0
9      B  1995    1.0
10     B  1998    NaN
11     B  2001    1.0
12     B  2002    1.0