Forward fill only certain value

1.9k views Asked by At

I have an array which represents object states, where 0 - object is off, and 1 - object is on.

import pandas as pd
import numpy as np

s = [np.nan, 0, np.nan, np.nan, 1, np.nan, np.nan, 0, np.nan, 1, np.nan]
df = pd.DataFrame(s, columns=["s"])
df
      s
0   NaN
1   0.0
2   NaN
3   NaN
4   1.0
5   NaN
6   NaN
7   0.0
8   NaN
9   1.0
10  NaN

I need to forward will only 0-values in it, like below.

>>> df_wanted
      s
0   NaN
1   0.0
2   0.0
3   0.0
4   1.0
5   NaN
6   NaN
7   0.0
8   0.0
9   1.0
10  NaN

After browsing similar queations here, I just compare ffill-ed and bfill-ed values and assign back with a mask:

mask = (df.ffill() == 0) & (df.bfill() == 1)
df[mask] = 0
df
      s
0   NaN
1   0.0
2   0.0
3   0.0
4   1.0
5   NaN
6   NaN
7   0.0
8   0.0
9   1.0
10  NaN

But it won't help if any 0 value is not followed by 1. What could be more elegant solution that takes such cases into account?

2

There are 2 answers

1
Ank On BEST ANSWER

mask = (df.ffill() == 0) should only be suffice to fulfill your usecase.

Firstly, df.ffill will propagate the last valid observation forward. So rows followed by 0 will be filled by 0s, and rows followed by 1 will be filled by 1s. Compare that to 0 to select rows with 0s only and use it as mask to get your final df.

Example: (Added a 0 and few NaNs to the end of your df)

>>> s = [np.nan, 0, np.nan, np.nan, 1, np.nan, np.nan, 0, np.nan, 1, np.nan, np.nan, 0, np.nan, np.nan, np.nan]
>>> df = pd.DataFrame(s, columns=["s"])
>>> df
      s
0   NaN
1   0.0
2   NaN
3   NaN
4   1.0
5   NaN
6   NaN
7   0.0
8   NaN
9   1.0
10  NaN
11  NaN
12  0.0
13  NaN
14  NaN
15  NaN
>>> 
>>> 
>>> df[df.ffill() == 0] = 0
>>> df
      s
0   NaN
1   0.0
2   0.0
3   0.0
4   1.0
5   NaN
6   NaN
7   0.0
8   0.0
9   1.0
10  NaN
11  NaN
12  0.0
13  0.0
14  0.0
15  0.0
0
Sina Meftah On

One way, maybe not much elegant but that works for you, would be to just ffill with everything and then pick from it where your original series was NaN and your ffilled series is 0.

sf = df.ffill().values[:, 0]
desired = np.where(np.isnan(s) & (sf==0), sf, s)

pandas has a where function too, I'm just more comfortable with numpy since it's more versatile.