I have a DataFrame df1
:
import pandas as pd
import numpy as np
import statsmodels.formula.api as sm
df1 = pd.DataFrame( np.random.randn(3000,1), index= pd.date_range('1/1/1990', periods=3000), columns = {"M"})
I would like to group elements in a box size = 10, fit them using OLS and compute Y_t
, where Y_t
stands for the series of straight line fits.
In other words, I would like to take the first 10 values, fit them using OLS ( Y_t = b*X_t+a_0) and obtain the values Y_t
for these 10 values. Again do the same for the next 10 values (not a rolling window!), and so on and so forth.
My approach
The first issue that I faced was that I could not fit elements using DateTime
values as predictors, so I defined a new DataFrame df_fit
that contains two columns A
and B
. Column A
contains integers from 0 to 9, and column B
the values of df1
in groups of 10 elements:
def compute_yt(df,i,bs):
df_fit = pd.DataFrame({"B": np.arange(1,bs+1),\
"A": df.reset_index().loc[i*bs:((i+1)*bs-1), "M"]})
fit = sm.ols(formula = "A ~ B", data = df_fit).fit()
yt = fit.params.B*df_fit["B"] + fit.params.Intercept
return yt
Where bs
is the box size (10 in this example), i
is an index that allows to sweep over all values.
Finally,
result = [compute_yt(df1,n,l) for n in np.arange(0,round(len(df1)/l)-1)]
result =
Name: B, dtype: float64, 840 -0.249590
841 -0.249935
842 -0.250280
843 -0.250625
844 -0.250970
845 -0.251315
846 -0.251660
847 -0.252005
848 -0.252350
849 -0.252695
Name: B, dtype: float64, 850 -0.252631
851 -0.252408
... ...
Where result
is a list that should contain the values for the straight line fits.
So, my questions are the following:
Is there a way to run an OLS using DateTime values as predictors?
I would like to use the list comprehension to build a DataFrame (with the same shape as
df1
) containing the values ofy_t
. This relates to question (1) in the sense that I would like to obtain a time-series for these values.Is there a more "pythonic" way to write this code? The way I have sliced the dataframe does not seem too much suitable.
Not really sure if this is what you wanted to do but I first added a group number and an observation number to each row of your dataframe and then pivoted it so that every row had 10 observations.
Output
I then wrote a function to do ordinary least squares with statsmodels - not the formula type.
I then called this function over all the rows via
apply
.With output a predicted value for each original set of 10 values.