Weighted data problems, mean is fine, but Covar and std look wrong, how do I adjust?

148 views Asked by At

I'm trying to apply a weighted filter on data rather the use raw data before calculating stats, mu, std and covar. But the results clearly need adjusting.

# generate some data and a filter
f_n = 100.
np.random.seed(seed=101); 
foo = np.random.rand(f_n,3)
foo = DataFrame(foo).add(1).pct_change()
f_filter = np.arange(f_n,.0,-1)
f_filter = 1.0 / (f_filter**(f_filter/f_n))
# nominalise the filter ... This could be where I'm going wrong?
f_filter = f_filter * (f_n / f_filter.sum())

Now we are ready to look at some results

print foo.mul(f_filter,axis=0).mean()
print foo.mean()

0    0.039147
1    0.039013
2    0.037598
dtype: float64
0    0.035006
1    0.042244
2    0.041956
dtype: float64

Means all look in line, but when we look at covar and std they are significantly different in terms of scale and also direction

print foo.mul(f_filter,axis=0).cov()
print foo.cov()

          0         1         2
0  0.124766 -0.038954  0.027256
1 -0.038954  0.204269  0.056185
2  0.027256  0.056185  0.203934

      0         1         2
0  0.070063 -0.014926  0.010434
1 -0.014926  0.099249  0.015573
2  0.010434  0.015573  0.087060

print foo.mul(f_filter,axis=0).std()
print foo.std()

0    0.353223
1    0.451961
2    0.451590
dtype: float64
0    0.264694
1    0.315037
2    0.295060
dtype: float64

Any ideas why, how can we adjust the filter or to adjust the covar matrix to make it more comparable?

1

There are 1 answers

2
Jianxun Li On BEST ANSWER

The problem is your weighting function. (Do you want Gaussian random numbers or uniform r.v.?) See this plot

f_n = 100.
np.random.seed(seed=101); 
# ??? you want uniform random variable? or is this just a typo and you want normal random variable?
foo = np.random.rand(f_n,3)
foo = DataFrame(foo)
f_filter = np.arange(f_n,.0,-1)

# here is the problem, uneven weight makes a artificial trend, causing non-stationary. covariance only works for stationary data.
# =============================================
f_filter = 1.0 / (f_filter**(f_filter/f_n))

fig, ax = plt.subplots()
ax.plot(f_filter)

enter image description here

Uneven weight makes a artificial trend (your random numbers are all positive uniforms), causing non-stationary. covariance only works for stationary data. Take a look at the resulting weighted data.

enter image description here