I'm trying to get PCA done for my data, which is dataframe with 16 observations in rows, and 11 features in columns.
In R, with prcomp
the matrix format consists of features in the rows and principal components in the columns. In Python using sklearn
the format is reversed. The rows are observations (in my case, administrative units), and the columns are again the principal components. While the eigenvalues and component loadings differ between R and Python, the cumulative sums of explained variance and the correlations of features with the principal components remain the same
I'm struggling to understand why these differences occur and how to interpret the Python results correctly. Any insights or explanations would be greatly appreciated.
R:
data_pca <- prcomp(data, scale = TRUE)
Python:
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)
pca = PCA()
data_pca = pca.fit_transform(data_scaled)
Edit: Results after I tranposed the data to end up with the same shape. Results are odd and still differ from R.
explained_variance explained_variance_ratio cumulative_sum
PC1 1.541840e+01 8.760452e-01 0.876045
PC2 6.401815e-01 3.637395e-02 0.912419
PC3 5.191492e-01 2.949711e-02 0.941916
PC4 4.163386e-01 2.365560e-02 0.965572
PC5 3.616688e-01 2.054936e-02 0.986121
PC6 9.329659e-02 5.300943e-03 0.991422
PC7 8.263950e-02 4.695426e-03 0.996118
PC8 4.770578e-02 2.710556e-03 0.998828
PC9 1.481567e-02 8.417995e-04 0.999670
PC10 5.808094e-03 3.300053e-04 1.000000
PC11 8.392454e-33 4.768440e-34 1.000000
Just for fun and since this is a good use case for
reticulate
I built a test case for comparison of the 2 PCA calculations. Here's the test script you can adapt it to your specific use case accordingly:As you will see, there are differences, but of the order of 1e-13 to 1e-14, numerically justifiable and negligible. At the level of the explained variance the differences are in the order of 1e-6, also very small.
PS. Your transformations are not necessary, as you can see from this example. I did not have to use the Scaler.