PCA in scikit-learn has an attribute called "explained_variance" which captures the variance explained by each component. I don't see a similar thing like this for FactorAnalysis in scikit-learn. How can I compute the variance explained by each component for Factor Analysis?
Factor Analysis in sklearn: Explained Variance
4.8k views Asked by vkmv At
2
There are 2 answers
1
On
Here is how you can do it :
First get the components matrix and the noise variance once you have performed factor analysis,let fa be your fitted model
m = fa.components_
n = fa.noise_variance_
Square this matrix
m1 = m**2
Compute the sum of each of the columns of m1
m2 = np.sum(m1,axis=1)
Now the %variance explained by the first factor will be
pvar1 = (100*m2[0])/np.sum(m2)
similarly, second factor
pvar2 = (100*m2[1])/np.sum(m2)
However, there is also a variance explained by the noise component, if you account for that in your variance explained you will need to compute
pvar1_with_noise = (100*m2[0])/(np.sum(m2)+np.sum(n))
pvar2_with_noise = (100*m2[1])/(np.sum(m2)+np.sum(n))
and so on. Hope this helps.
In terms of the usual nomenclature of FA/PCA, the
components_
output by scikit-learn may be referred to as loadings elsewhere. For example, the package FactorAnalyzer outputsloadings_
which are equivalent, once you change the settings to match scikit-learn (i.e. setrotation=None
, setmethod='ml'
, and make sure your data is standardized when input into the scikit-learn function, as FactorAnalyzer standardizes the data internally).Compared to the
components_
output of PCA from scikit-learn, which are unit-length eigenvectors, the FA ones are already scaled, so the explained variance can be extracted by summing the squares. Note that proportion of variance explained is expressed here in terms of the total variance of the original variables, not the variance of the factors, as in the answer from @Gaurav.