I'm trying to calculate the rolling beta between two columns in a DataFrame.
I explain myself, the beta (finance) classically answers the following formula: cov(asset_1, asset_2)/var(asset_2). For my case I want to calculate a rolling beta using a Kalman filter. This Kalman filter is then initialized with the classical beta formula.
My problem is the following: I have a data frame containing two columns (asset_1 and asset_2). And a beta_kalman(s1,s2) function (using pykalman) .
The beta kalman function takes as parameters two series corresponding to s1=asset_1 and s2=asset_2.
Currently I do it with a simple method via a for loop by shifting the index of 1 at each increment by giving in my function the corresponding values of each asset on the window.
The problem is that the execution time is too long.
Would he have a solution to decrease this computation time ? I was imagining something like : df.rolling(window).apply(beta_kalman) but this does not work.
Would you have an idea for a solution ?
N.B : I have to implement this under python 3.6.3 with pandas 0.20.3
Thanks in advance for your ideas !
Sample data, and the minimum code required to recreate the issue below :
from pandas import Timestamp
import pandas as pd
import numpy as np
from pykalman import KalmanFilter
My function to compute beta with pykalman :
def beta_kalman(s1,s2, delta=1e-2):
beta_init = (np.cov(s1, s2)[0, 1] / np.var(s2))
trans_cov = delta / (1 - delta) * np.eye(2)
obs_mat = np.vstack([s2, np.ones(s2.shape)]).T[:, np.newaxis]
kf = KalmanFilter(n_dim_obs=1, # 1-D
n_dim_state=2, # 2-D
initial_state_mean=[beta_init, 0],
initial_state_covariance=np.ones((2, 2)),
transition_matrices=np.eye(2),
observation_matrices=obs_mat,
observation_covariance=2,
transition_covariance=trans_cov,
)
state_means, _ = kf.filter(s1.values)
return state_means[:, 0]
Sample data :
df_returns = {'asset_1': {Timestamp('2015-02-02 00:00:00', freq='B'): -0.00065638527967171179,
Timestamp('2015-02-03 00:00:00', freq='B'): 0.0022343530982782411,
Timestamp('2015-02-04 00:00:00', freq='B'): 0.00047087917232135901,
Timestamp('2015-02-05 00:00:00', freq='B'): 0.00068940734601552478,
Timestamp('2015-02-06 00:00:00', freq='B'): 0.001155443533138456,
Timestamp('2015-02-09 00:00:00', freq='B'): -0.00073878429513896116,
Timestamp('2015-02-10 00:00:00', freq='B'): 6.5331180920669141e-06,
Timestamp('2015-02-11 00:00:00', freq='B'): -0.00047848662447047552,
Timestamp('2015-02-12 00:00:00', freq='B'): 0.00075100030101071802,
Timestamp('2015-02-13 00:00:00', freq='B'): 0.0011705611535068883,
Timestamp('2015-02-16 00:00:00', freq='B'): 0.00051393092538964957,
Timestamp('2015-02-17 00:00:00', freq='B'): -0.00048847349932235051,
Timestamp('2015-02-18 00:00:00', freq='B'): 0.0012106608634878668,
Timestamp('2015-02-19 00:00:00', freq='B'): 0.0013241124699925333,
Timestamp('2015-02-20 00:00:00', freq='B'): 0.00071000350611760688,
Timestamp('2015-02-23 00:00:00', freq='B'): 0.0018171290896187298,
Timestamp('2015-02-24 00:00:00', freq='B'): 0.00239364252208496,
Timestamp('2015-02-25 00:00:00', freq='B'): 0.0015992532863815523,
Timestamp('2015-02-26 00:00:00', freq='B'): 0.0019965436705504658,
Timestamp('2015-02-27 00:00:00', freq='B'): 0.0011555193318930623},
'asset_2': {Timestamp('2015-02-02 00:00:00', freq='B'): 0.0055712469218620608,
Timestamp('2015-02-03 00:00:00', freq='B'): 0.01307503061081472,
Timestamp('2015-02-04 00:00:00', freq='B'): 0.0003952002997402726,
Timestamp('2015-02-05 00:00:00', freq='B'): -0.0017481486478068131,
Timestamp('2015-02-06 00:00:00', freq='B'): -0.0031670739060284392,
Timestamp('2015-02-09 00:00:00', freq='B'): -0.014835535182983417,
Timestamp('2015-02-10 00:00:00', freq='B'): 0.010569586765601269,
Timestamp('2015-02-11 00:00:00', freq='B'): -0.002657959034321089,
Timestamp('2015-02-12 00:00:00', freq='B'): 0.012883068432518074,
Timestamp('2015-02-13 00:00:00', freq='B'): 0.008773174815372986,
Timestamp('2015-02-16 00:00:00', freq='B'): -0.0041451490599345719,
Timestamp('2015-02-17 00:00:00', freq='B'): 0.0014955867933237332,
Timestamp('2015-02-18 00:00:00', freq='B'): 0.0079578196824314773,
Timestamp('2015-02-19 00:00:00', freq='B'): 0.0064298048361444149,
Timestamp('2015-02-20 00:00:00', freq='B'): 0.0007021736832582004,
Timestamp('2015-02-23 00:00:00', freq='B'): 0.0083229889997538109,
Timestamp('2015-02-24 00:00:00', freq='B'): 0.007817259530292997,
Timestamp('2015-02-25 00:00:00', freq='B'): -0.001499671169154615,
Timestamp('2015-02-26 00:00:00', freq='B'): 0.0093629482797668029,
Timestamp('2015-02-27 00:00:00', freq='B'): 0.0067304325978827517}}
df_returns = pd.DataFrame.from_dict(df_returns)
What I actually do, which works but way to long :
df_betas = pd.DataFrame(index=df_returns.index, columns=['BETA'])
window=5
for i in range(len(df_returns.index) - window+1):
temp_returns = df_returns[['asset_1', 'asset_2']].iloc[i:i + window].copy()
df_betas.loc[temp_returns.index[-1], 'BETA'] = beta_kalman(temp_returns['asset_1'], temp_returns['asset_2'])[-1]
What I want to do :
df_betas2 = df_returns.copy()
df_betas2 = df_betas2.rolling(window).apply(beta_kalman)
Error I get :
TypeError: beta_kalman() missing 1 required positional argument: 's2'
This error is logical, because with apply there is only one argument that is passed to beta_kalman function. But the problem is that what is passed as an argument via apply corresponds to an array (shape=(5,)) of the first column (asset_1) values and not to both columns.