why my pandas groupby ffill will(may?) fill between different groupbys?

52 views Asked by At
df = df.query('Time >= @start_time and Time < @end_time')
df.loc[:, 'date'] = pd.to_datetime(df['date'])
df['ret'] = (df.groupby(['date', 'sym'])['Close']
               .ffill()
               .pct_change())

when I used this to calculate the return, I found that the first row of each groupbyer's ['ret'] is not NaN. It shows a huge number which wouldn't be the return of a minute bar. So I guess pandas ffill between different groups. But I don't know how to solve this.

For example, My DataFrame's columns are date, sym, Time, and Close.

The result is supposed to be

Value
30986938 NaN
30986939 0.000934
30986940 0.001386
30986941 -0.000461
30986942 0.000462
30986943 -0.000180
30986944 0.000180

but it gives

Value
30986938 -0.148827
30986939 0.000934
30986940 0.001386
30986941 -0.000461
30986942 0.000462
30986943 -0.000180
30986944 0.000180

I tried use apply/transform(lambda x: x.ffill()) or groupby(, as_index=False). All don't work.

I found the bug. I should use

df.groupby(['date', 'sym'])['Close'].apply(lambda x: x.ffill().pct_change())
0

There are 0 answers