What does it mean/signify when the first component covers for more than 99% of the total variance in PCA analysis ? I have a feature vector of size 500X1000 on which I used Matlab's pca function which returns [coeff,score,latent,tsquared,explained]. The variable 'explained' returns the percentage of variance covered by each component.
Significance of 99% of variance covered by the first component in PCA
4.5k views Asked by noob333 At
1
The
explained
tells you how accurately you could represent the data by just using that principal component. In your case it means that just using the main principal component, you can describe very accurately (to a 99%) the data.Lets make a 2D example. Imagine you have data that is
100x2
and you do PCA.the result could be something like this (taken from the internets)
This data will give you an
explained
value for the first principal component (PCA 1st dimension big green arrow in the figure) of around 90%.What does it means?
It means that if you project all your data to that line, you will reconstruct the points with 90% of accuracy (of course, you will loose the information in the PCA 2nd dimension direction).
In your example, with 99% it visually means that almost all the points in blue are laying on the big green arrow, with very little variation in the small green arrow direction.
Of course it is way more difficult to visualize with 1000 dimensions instead of 2, but I hope you understand.