resample a pandas dataframe from today backwards

56 views Asked by At

I have a dataframe and would like to resample it and aggregate over average_period=14 days. The difficult part (for me) is that I would like to have my aggregation bins as starting from today, so [today, today-14], [today-14, today-28], [today-28, today-42], etc. Today's date is always in df but the dates before are not necessarily always there.

If I do the below I get the maximum date 2024-01-23 but it should be 2024-01-13. How can I achieve this?

df = pd.DataFrame({'date_time':['2023-09-19', '2023-09-29', '2023-11-10', '2024-01-13'],
'col1':['0.100', '0.100', '0.100', '0.100'],
'col2':['r', 'r', 'r', 'r'],
'tot':[900, 800, 300, 400],
'hit':[24, 56, 26, 40],
'percent':[33, 23, 33, 31]})

df = df.assign(date_time=pd.to_datetime(df.date_time))

average_period = 14
(df
    .set_index('date_time')
    .groupby(['col1', 'col2']).resample(f'{average_period}D', 
                                              closed='right', 
                                              label='right').agg({'hit':'sum', 
                                                                  'tot':'sum', 
                                                                  'percent':'mean'})
    .reset_index())
1

There are 1 answers

0
Ugochukwu Obinna On BEST ANSWER
Try this, 

import pandas as pd
from datetime import datetime


df = pd.DataFrame({
    'date_time': ['2023-09-19', '2023-09-29', '2023-11-10', '2024-01-13'],
    'col1': ['0.100', '0.100', '0.100', '0.100'],
    'col2': ['r', 'r', 'r', 'r'],
    'tot': [900, 800, 300, 400],
    'hit': [24, 56, 26, 40],
    'percent': [33, 23, 33, 31]
})

# date_time to datetime object
df['date_time'] = pd.to_datetime(df['date_time'])

# Define today's date
today = datetime(2024, 1, 13)

# The average period
average_period = 14

# resampling
result = (df
          .set_index('date_time')
          .groupby(['col1', 'col2'])
          .resample(f'{average_period}D', 
                    closed='right', 
                    label='right',
                    origin=today)  # Set the origin to today's date
          .agg({'hit': 'sum', 
                'tot': 'sum', 
                'percent': 'mean'})
          .reset_index())

result