The Factor Model, covariance estimation
I'm currently working on replicating the factor covariance matrix estimation process in Python. However, I've encountered some doubts about my implementation.
When I compute the z-score on the returns, I obtain some strange results. On the other hand, when I demean the returns, the output appears to be much more coherent. I'm quite confused about this topic as I'm trying to ensure that my implementation is correct by replicating the results obtained with sklearn. I've used the demeaned returns for the calculation of the correlation matrix, factor returns, and residuals. While browsing the web, I've noticed that some implementations use demeaned returns only for the correlation matrix and factor returns, but the residuals are calculated using the raw returns.
I've been considering the loadings (factor betas/exposures) as the eigenvectors obtained from PCA. Is it correct to assume that in PCA, the loadings are indeed the eigenvectors, while in factor analysis, they are defined as pca.components_ * sqrt(eigenvalues)?
Here There is my python implementation:
import yfinance as yf
import numpy as np
import pandas as pd
symbols = ["AAPL","MSFT","AMZN","NVDA","TSLA","GOOGL","META","JNJ","XOM","UNH"]
df = yf.download(symbols,start="2022-03-11", end="2024-02-01",auto_adjust=True)
["Close"]\
.resample('W-FRI')\
.last()
r = df.pct_change()\
.dropna().values
npca = 4
r_demeaned = r - r.mean(axis=0)
S = np.corrcoef(r.T)
eig_val, eig_vec = np.linalg.eig(S)
idx = np.flip(np.argsort(eig_val))
eig_val, eig_vec = (eig_val[idx[:npca]], eig_vec[:,idx[:npca]])
factor_betas = eig_vec
factor_returns = np.round(r_demeaned@factor_betas,3)
factor_cov_matrix = np.diag(np.var(factor_returns, axis=0, ddof=1))*252
common_returns = np.round(np.dot(factor_returns,factor_betas.T),3)
residuals = np.round(r_demeaned - common_returns,3)
idiosyncratic_var_matrix = np.diag(np.var(residuals, axis=0, ddof=1))*252
idiosyncratic_var_vector = np.diag(idiosyncratic_var_matrix)
factor_risk = np.dot(np.dot(factor_betas, factor_cov_matrix), factor_betas.T) +\
idiosyncratic_var_matrix
pd.DataFrame(factor_risk, columns=df.columns, index=df.columns)
The Factor Covariance matrix is crucial as it will be utilized in multiple periods for portfolio optimization problems.
Any insights or clarification on these points would be greatly appreciated.