I have the following data
4/23/2021 493107
4/26/2021 485117
4/27/2021 485117
4/28/2021 485117
4/29/2021 485117
4/30/2021 485117
5/7/2021 484691
I want it to look like the following:
4/23/2021 493107
4/24/2021 485117
4/25/2021 485117
4/26/2021 485117
4/27/2021 485117
4/28/2021 485117
4/29/2021 485117
4/30/2021 485117
5/1/2021 484691
5/2/2021 484691
5/3/2021 484691
5/4/2021 484691
5/5/2021 484691
5/6/2021 484691
5/7/2021 484691
So it uses date below to fill in the missing data. I tried the following code:
df['Date']=pd.to_datetime(df['Date'].astype(str), format='%m/%d/%Y')
df.set_index(df['Date'], inplace=True)
df = df.resample('D').sum().fillna(0)
df['crude'] = df['crude'].replace({ 0:np.nan})
df['crude'].fillna(method='ffill', inplace=True)
However, this results in taking the data above and getting the following:
4/23/2021 493107
4/24/2021 493107
4/25/2021 493107
4/26/2021 485117
4/27/2021 485117
4/28/2021 485117
4/29/2021 485117
4/30/2021 485117
5/1/2021 485117
5/2/2021 485117
5/3/2021 485117
5/4/2021 485117
5/5/2021 485117
5/6/2021 485117
5/7/2021 969382
Which does not match what I need the output to be.
Try replace 0 with bfill instead of ffill:
df
: