I have a dataframe with multiple columns. For each column, I want to return an indicator (1 or -1), changing if the column's rolling mean resets when the current row value is a defined multiple of the rolling mean value. An added complication is that I don't want the reset to occur until the rolling window has reached a minimum length, following the previous reset.
Here's an example with multiple 2, minimum length 4, and a starting indicator value of 1
Starting dataframe:
df = pd.DataFrame(
{
"A": [0.1, 0.1, 0.15, 0.1, 0.1, 0.7, 0.1, 0.1, 0.5, 1, 0.1, 0.1],
"B": [0.1, 0.1, 0.4, 0.1, 0.8, 0.1, 0.1, 0.1, 0.1, 0.1, 0.9, 0.1],
},
index=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
)
print(df)
A B
0 0.10 0.1
1 0.10 0.1
2 0.15 0.4
3 0.10 0.1
4 0.10 0.8
5 0.70 0.1
6 0.10 0.1
7 0.10 0.1
8 0.50 0.1
9 1.00 0.1
10 0.10 0.9
11 0.10 0.1
I would like to obtain:
A B
0 1 1
1 1 1
2 1 1
3 1 1
4 1 -1
5 -1 -1
6 -1 -1
7 -1 -1
8 -1 -1
9 1 -1
10 1 1
11 1 1
Looking at column A for example:
- first reset happens on row 5, as 0.7 >= 2 * mean(rows 0 to 5), and we have observed 6 rows
- nothing happens on row 8 as the new rolling mean only has 3 observations
- next reset happens on row 9, as 1 >= 2 * mean(rows 6 to 9), and we have observed 4 rows
I have looked at various posts, the closest being the below, but none seems to really address that problem fully:
You can try:
Prints:
You can speed-up the computation using numba: