df = df.query('Time >= @start_time and Time < @end_time')
df.loc[:, 'date'] = pd.to_datetime(df['date'])
df['ret'] = (df.groupby(['date', 'sym'])['Close']
.ffill()
.pct_change())
when I used this to calculate the return, I found that the first row of each groupbyer's ['ret'] is not NaN. It shows a huge number which wouldn't be the return of a minute bar. So I guess pandas ffill between different groups. But I don't know how to solve this.
For example, My DataFrame's columns are date, sym, Time, and Close.
The result is supposed to be
| Value | |
|---|---|
| 30986938 | NaN |
| 30986939 | 0.000934 |
| 30986940 | 0.001386 |
| 30986941 | -0.000461 |
| 30986942 | 0.000462 |
| 30986943 | -0.000180 |
| 30986944 | 0.000180 |
but it gives
| Value | |
|---|---|
| 30986938 | -0.148827 |
| 30986939 | 0.000934 |
| 30986940 | 0.001386 |
| 30986941 | -0.000461 |
| 30986942 | 0.000462 |
| 30986943 | -0.000180 |
| 30986944 | 0.000180 |
I tried use apply/transform(lambda x: x.ffill()) or groupby(, as_index=False). All don't work.
I found the bug. I should use
df.groupby(['date', 'sym'])['Close'].apply(lambda x: x.ffill().pct_change())