Significance of 99% of variance covered by the first component in PCA

4.5k views Asked by At

What does it mean/signify when the first component covers for more than 99% of the total variance in PCA analysis ? I have a feature vector of size 500X1000 on which I used Matlab's pca function which returns [coeff,score,latent,tsquared,explained]. The variable 'explained' returns the percentage of variance covered by each component.

1

There are 1 answers

6
Ander Biguri On BEST ANSWER

The explained tells you how accurately you could represent the data by just using that principal component. In your case it means that just using the main principal component, you can describe very accurately (to a 99%) the data.

Lets make a 2D example. Imagine you have data that is 100x2 and you do PCA.

the result could be something like this (taken from the internets)

enter image description here

This data will give you an explained value for the first principal component (PCA 1st dimension big green arrow in the figure) of around 90%.

What does it means?

It means that if you project all your data to that line, you will reconstruct the points with 90% of accuracy (of course, you will loose the information in the PCA 2nd dimension direction).

In your example, with 99% it visually means that almost all the points in blue are laying on the big green arrow, with very little variation in the small green arrow direction.

Of course it is way more difficult to visualize with 1000 dimensions instead of 2, but I hope you understand.