Principal Component Analysis and Clustering - Better Discrimination between Classes

24 views Asked by At

I am performing unsupervised clustering with PCA. The first 7 Components contain 50, 13, 9, 8, 5, 3, 3% of the variance.

There is no feature that stands out in PC1. However there are some stand out features in the remaining PCs in terms of the loadings.

When I compare my results to the ground truth, the clustering is poor. If I exclude PC1, my results improve a bit.

Why is it that my clustering algorithm discriminates better when I exclude PC1 scores from the input data? And is this okay to do - ie: leaving out 50% variance of the original data.

Thanks

Clustering with PCA with and without PC1 included in the input data.

0

There are 0 answers