My Principal components plotted using sklearn seems a bit rotated by some degrees. What have I missed?

38 views Asked by At
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

# generate synthetic data with a linear relationship
np.random.seed(0)
mean = [0, 0]  # mean of both features
cov = [[1, 0.8], [0.8, 1]]  # covariance matrix to sample from
data = np.random.multivariate_normal(mean, cov, 200000) # sample data points with given parameters

# feature cols
feature_1 = data[:, 0]
feature_2 = data[:, 1]

# perform PCA
pca = PCA(n_components=2)
principal_components = pca.fit_transform(data)

# principal component vectors
eigenvectors = pca.components_
eigenvalues = pca.explained_variance_
eigenvectors_unit = eigenvectors / np.linalg.norm(eigenvectors, axis=1, keepdims=True) # computing unit eigenvectors

pc1 = eigenvectors_unit[0, :] * np.sqrt(eigenvalues[0])
pc2 = eigenvectors_unit[1, :] * np.sqrt(eigenvalues[1])


# plot principal components
plt.figure(figsize=(8, 6))
sns.scatterplot(x=feature_1, y=feature_2, color='black', alpha=0.8)
plt.title('Principal Components')

origin = np.zeros(2)
plt.quiver(origin[0], origin[1], pc1[0], pc1[1], color='red', scale=5)
plt.quiver(origin[0], origin[1], pc2[0], pc2[1], color='blue', scale=5)
plt.tight_layout()
plt.show()

enter image description here

I generated 200,000 points and wanted to just see when I perform PCA, how the new principal components that captures the most variance look like. I used pca from sklearn and just normalized them in case if they are not. Then I scale them accordingly to their eigenvalues and wanted to plot them to understand how they look. I have already seen several pictures of how an ideal principal components should look like. And usually they bisect the distribution when the points are equally spread. But in my case it seems like the principal components are a bit tilted by some angle. I cannot figure what I missed or did wrong.

0

There are 0 answers