Time Series Rolling Windows Feature

41 views Asked by At

If I'm creating a Rolling Mean Feature based on my Sales (target) column, is it necessary to shift it?

Let me give an example:

Lets suppose I have days 01~10 in my dataset. If I create a Mean Rolling Window column of 7 Days, in my day 10th day row, It will consider the 7th day as the value of this row to calculate the Rolling Mean, for example. Now If I'm going to predict day 11, which is tomorrow, I would need the Sales value of this day in order to have the Rolling Mean, which makes no Sense.

So, It makes more Sense in my opinion to always get the 7 last Days, not considering the current.

Can anyone help?

1

There are 1 answers

2
Sandwichnick On

I will assume that you can use the Pandas-library, as its powerful rolling function is able to easily accomodate your request.

Consider the following example:

import pandas as pd
my_values = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
my_window_size = 3
rolling_mean = my_values.shift(1).rolling(window=my_window_size).mean()
print(rolling_mean)

Which results in

0    NaN
1    NaN
2    NaN
3    2.0
4    3.0
5    4.0
6    5.0
7    6.0
8    7.0
9    8.0

AS you can see, this enables you to use the mean of the indices [0,1,2] to be displayed at index 3 ((1+2+3)/3 =2). The NAs at the beginning are there because the window function doesnt know what to do if its window doesnt completely overlap with the series.

We shifted the Series here by before calculating the rolling transformation, something you wanted to avoid.

In your special case (which is that you shift by 1), the window Function can imporoved by the closed argument:

import pandas as pd

my_values = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
my_window_size = 3
rolling_mean = my_values.rolling(window=my_window_size, closed='left').mean()
print(rolling_mean)
0    NaN
1    NaN
2    NaN
3    2.0
4    3.0
5    4.0
6    5.0
7    6.0
8    7.0
9    8.0

closed "left" means that the last point will mean that the current point should not be part of the calculations of the window. (A window has kind of left and right changed, when we speak of the leftmost point in the window it will be the rightmost point in the subseries the window "sees", this is due to the maths behind it, i would just roll with it :D)

you can find the closed options here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.rolling.html#:~:text=DataFrame%20first%20instead.-,closed,-str%2C%20default%20None