Pandas - calculate rolling standard deviation over all columns

101 views Asked by At

Say I have a pd.DataFrame and want to calculate the rolling Standard deviation. In pandas I can use rolling(window=x).std(), but it gives me the SD by column. I however want the standard deviation over all columns in a given row.

As an example consider the pd dataframe

df = pd.DataFrame({'col1': [1,2,3,4,5,6], 'col2': [-1,-2,-3,-4,-5,-6], 'col3': [1,2,3,4,5,6]})
df
   col1  col2  col3
0     1    -1     1
1     2    -2     2
2     3    -3     3
3     4    -4     4
4     5    -5     5
5     6    -6     6

When calculating for a window size of 2, for instance, I would like to have the standard deviation in row 2 as the sum of the two rows divided by 6 (or 6-1), doesn't matter), so: np.std([2,-2,2,1,-1,1]).

I tried to calculate it on a melted dataframe, but I didn't get the result as expected:

df.reset_index().melt(id_vars='index').set_index('index')['value'].rolling(2).std()

Does anyone have an idea how to do it? I appreciate your feedback.

2

There are 2 answers

3
Panda Kim On BEST ANSWER
n = len(df.columns)
window = 2
out = df.stack().rolling(window * n).std().xs(df.columns[-1], level=-1)

out:

0         NaN
1    1.643168
2    2.639444
3    3.656045
4    4.679744
5    5.706721
dtype: float64

out is only std.

you said divide by 6 or 5, so i dont divide anything for you to choose.

0
mozway On

The most efficient in my opinion is to use 's sliding_window_view to form a 3D intermediate and use std on it (be aware that numpy's std has ddof=0 by default and pandas ddof=1):

from numpy.lib.stride_tricks import sliding_window_view as swv

N = 2
out = pd.Series(swv(df.to_numpy(), N, axis=0).std((1,2), ddof=1),
                index=df.index[N-1:])

Output:

1    1.643168
2    2.639444
3    3.656045
4    4.679744
5    5.706721
dtype: float64

As a new column:

from numpy.lib.stride_tricks import sliding_window_view as swv

N = 2
df.loc[df.index[N-1:], 'out'] = (swv(df.to_numpy(), N, axis=0)
                                 .std((1,2), ddof=1)
                                )

Output:

   col1  col2  col3       out
0     1    -1     1       NaN
1     2    -2     2  1.643168
2     3    -3     3  2.639444
3     4    -4     4  3.656045
4     5    -5     5  4.679744
5     6    -6     6  5.706721