I am using pandas 1.2.3 (quite old, I know but versions are frozen for a certain project). In this project, I am using the following line of code: df.rolling(n).mean(skipna=False)
With pandas 1.5+ this gives the following warning: FutureWarning: Passing additional kwargs to Rolling.mean has no impact on the result and is depracted. This will raise a TypeError in a future version of pandas.
However, I cannot find any recommendation on the web how the old behavior can be replicated 1-1 with newer versions.
Thank you for your help!
If you are using Pandas 1.2.3, the skipna argument does nothing in the context of a rolling window mean.
This might be good news, because all you have to do is delete
skipna=False, and your code will be compatible with Pandas 1.5. It might also be bad news, if your code is depending on skipna, because that could indicate that your code has a bug if it is expecting skipna to do something. (This is not really a problem forskipna=False, as that's the default, but if you haveskipna=False, you may also haveskipna=Truesomewhere else.)I have three pieces of evidence for skipna doing nothing in the context of a rolling window mean.
Test program
First, the following test program tries to take the mean of a series with a missing value, with skipna set to both True and False. I ran this test program using Pandas 1.2.3.
Output:
The outputs for both are the same. If it were skipping NA values while computing the mean, then all of the values in position 2 through 6 should not be NA. Effectively, this means that a rolling mean does not skip NA values whether or not skipna is set.
Documentation
Second, I read what the documentation for Pandas 1.2 says about skipna. Here, we need to be careful that we are reading the right page. What we want is not
DataFrame.mean(), butpandas.core.window.rolling.Rolling.mean().This page does not document a skipna parameter. Presumably, if somebody went through the trouble of writing code to implement skipna, they would have documented it. On the other hand, it does have a catch-all **kwargs parameter, which is unhelpfully documented as "under review."
Might skipna have an effect, just one that does not show up in my test program, and one which is not documented?
Reading the Pandas code
The third approach I used to investigate this was to run the code under a debugger.
By running the test program under a debugger, I found the code within Pandas which is responsible for computing the rolling mean. This code does not implement any kind of skipna functionality.
Here is how rolling mean is implemented.
df.rolling(3).mean()is called, which callspandas/core/window/rolling.py:Rolling.mean()is called, which callspandas/core/window/rolling.py:RollingAndExpandingMixin.mean()is called, which callspandas/core/window/rolling.py:BaseWindow._apply()is called, which callspandas/_libs/window/aggregations.pyx:roll_mean(), which computes the rolling mean.There are two things in this chain that make it impossible for skipna to do anything.
First, skipna is being carried in the kwargs variable up until
BaseWindow._apply()is called. But in that function, the kwargs parameter is unused. Even ifroll_mean()implemented skipna, there is no way for the skipna parameter to get toroll_mean().Second, if you read the source code of
roll_mean(), it does not skip NA values. It doesn't check for them.In order for skipna to do something, there would need to be code here to implement it, and there isn't.
Conclusion
The skipna parameter to
df.rolling(3).mean(skipna=False)used to do nothing. It still does nothing, but now it complains about it.