Apply function from rolling window on multiple columns

413 views Asked by At

I'm trying to calculate the rolling beta between two columns in a DataFrame.

I explain myself, the beta (finance) classically answers the following formula: cov(asset_1, asset_2)/var(asset_2). For my case I want to calculate a rolling beta using a Kalman filter. This Kalman filter is then initialized with the classical beta formula.

My problem is the following: I have a data frame containing two columns (asset_1 and asset_2). And a beta_kalman(s1,s2) function (using pykalman) .

The beta kalman function takes as parameters two series corresponding to s1=asset_1 and s2=asset_2.

Currently I do it with a simple method via a for loop by shifting the index of 1 at each increment by giving in my function the corresponding values of each asset on the window.

The problem is that the execution time is too long.

Would he have a solution to decrease this computation time ? I was imagining something like : df.rolling(window).apply(beta_kalman) but this does not work.

Would you have an idea for a solution ?

N.B : I have to implement this under python 3.6.3 with pandas 0.20.3

Thanks in advance for your ideas !

Sample data, and the minimum code required to recreate the issue below :

from pandas import Timestamp
import pandas as pd
import numpy as np
from pykalman import KalmanFilter

My function to compute beta with pykalman :

def beta_kalman(s1,s2, delta=1e-2):
    beta_init = (np.cov(s1, s2)[0, 1] / np.var(s2))
    trans_cov = delta / (1 - delta) * np.eye(2)
    obs_mat = np.vstack([s2, np.ones(s2.shape)]).T[:, np.newaxis]

    kf = KalmanFilter(n_dim_obs=1,  # 1-D
                      n_dim_state=2,  # 2-D
                      initial_state_mean=[beta_init, 0],
                      initial_state_covariance=np.ones((2, 2)),
                      transition_matrices=np.eye(2),
                      observation_matrices=obs_mat,
                      observation_covariance=2,
                      transition_covariance=trans_cov,
                      )
    state_means, _ = kf.filter(s1.values)

    return state_means[:, 0]

Sample data :

df_returns = {'asset_1': {Timestamp('2015-02-02 00:00:00', freq='B'): -0.00065638527967171179,
                          Timestamp('2015-02-03 00:00:00', freq='B'): 0.0022343530982782411,
                          Timestamp('2015-02-04 00:00:00', freq='B'): 0.00047087917232135901,
                          Timestamp('2015-02-05 00:00:00', freq='B'): 0.00068940734601552478,
                          Timestamp('2015-02-06 00:00:00', freq='B'): 0.001155443533138456,
                          Timestamp('2015-02-09 00:00:00', freq='B'): -0.00073878429513896116,
                          Timestamp('2015-02-10 00:00:00', freq='B'): 6.5331180920669141e-06,
                          Timestamp('2015-02-11 00:00:00', freq='B'): -0.00047848662447047552,
                          Timestamp('2015-02-12 00:00:00', freq='B'): 0.00075100030101071802,
                          Timestamp('2015-02-13 00:00:00', freq='B'): 0.0011705611535068883,
                          Timestamp('2015-02-16 00:00:00', freq='B'): 0.00051393092538964957,
                          Timestamp('2015-02-17 00:00:00', freq='B'): -0.00048847349932235051,
                          Timestamp('2015-02-18 00:00:00', freq='B'): 0.0012106608634878668,
                          Timestamp('2015-02-19 00:00:00', freq='B'): 0.0013241124699925333,
                          Timestamp('2015-02-20 00:00:00', freq='B'): 0.00071000350611760688,
                          Timestamp('2015-02-23 00:00:00', freq='B'): 0.0018171290896187298,
                          Timestamp('2015-02-24 00:00:00', freq='B'): 0.00239364252208496,
                          Timestamp('2015-02-25 00:00:00', freq='B'): 0.0015992532863815523,
                          Timestamp('2015-02-26 00:00:00', freq='B'): 0.0019965436705504658,
                          Timestamp('2015-02-27 00:00:00', freq='B'): 0.0011555193318930623},
              'asset_2': {Timestamp('2015-02-02 00:00:00', freq='B'): 0.0055712469218620608,
                          Timestamp('2015-02-03 00:00:00', freq='B'): 0.01307503061081472,
                          Timestamp('2015-02-04 00:00:00', freq='B'): 0.0003952002997402726,
                          Timestamp('2015-02-05 00:00:00', freq='B'): -0.0017481486478068131,
                          Timestamp('2015-02-06 00:00:00', freq='B'): -0.0031670739060284392,
                          Timestamp('2015-02-09 00:00:00', freq='B'): -0.014835535182983417,
                          Timestamp('2015-02-10 00:00:00', freq='B'): 0.010569586765601269,
                          Timestamp('2015-02-11 00:00:00', freq='B'): -0.002657959034321089,
                          Timestamp('2015-02-12 00:00:00', freq='B'): 0.012883068432518074,
                          Timestamp('2015-02-13 00:00:00', freq='B'): 0.008773174815372986,
                          Timestamp('2015-02-16 00:00:00', freq='B'): -0.0041451490599345719,
                          Timestamp('2015-02-17 00:00:00', freq='B'): 0.0014955867933237332,
                          Timestamp('2015-02-18 00:00:00', freq='B'): 0.0079578196824314773,
                          Timestamp('2015-02-19 00:00:00', freq='B'): 0.0064298048361444149,
                          Timestamp('2015-02-20 00:00:00', freq='B'): 0.0007021736832582004,
                          Timestamp('2015-02-23 00:00:00', freq='B'): 0.0083229889997538109,
                          Timestamp('2015-02-24 00:00:00', freq='B'): 0.007817259530292997,
                          Timestamp('2015-02-25 00:00:00', freq='B'): -0.001499671169154615,
                          Timestamp('2015-02-26 00:00:00', freq='B'): 0.0093629482797668029,
                          Timestamp('2015-02-27 00:00:00', freq='B'): 0.0067304325978827517}}

df_returns = pd.DataFrame.from_dict(df_returns)

What I actually do, which works but way to long :

df_betas = pd.DataFrame(index=df_returns.index, columns=['BETA'])
window=5
for i in range(len(df_returns.index) - window+1):
    temp_returns = df_returns[['asset_1', 'asset_2']].iloc[i:i + window].copy()

    df_betas.loc[temp_returns.index[-1], 'BETA'] = beta_kalman(temp_returns['asset_1'], temp_returns['asset_2'])[-1]

What I want to do :

df_betas2 = df_returns.copy()
df_betas2 = df_betas2.rolling(window).apply(beta_kalman)

Error I get :

TypeError: beta_kalman() missing 1 required positional argument: 's2'

This error is logical, because with apply there is only one argument that is passed to beta_kalman function. But the problem is that what is passed as an argument via apply corresponds to an array (shape=(5,)) of the first column (asset_1) values and not to both columns.

0

There are 0 answers