Reducing dimensionality on training data with PCA in Matlab

4.1k views Asked by At

This is a follow up question to:

PCA Dimensionality Reduction

In order to classify the new 10 dimensional test data do I have to reduce the training data down to 10 dimensions as well?

I tried:

X = bsxfun(@minus, trainingData, mean(trainingData,1));           
covariancex = (X'*X)./(size(X,1)-1);                 
[V D] = eigs(covariancex, 10);   % reduce to 10 dimension
Xtrain = bsxfun(@minus, trainingData, mean(trainingData,1));  
pcatrain = Xtest*V;

But using the classifier with this and the 10 dimensional testing data produces very unreliable results? Is there something that I am doing fundamentally wrong?

Edit:

X = bsxfun(@minus, trainingData, mean(trainingData,1));           
covariancex = (X'*X)./(size(X,1)-1);                 
[V D] = eigs(covariancex, 10);   % reduce to 10 dimension
Xtrain = bsxfun(@minus, trainingData, mean(trainingData,1));  
pcatrain = Xtest*V;

X = bsxfun(@minus, pcatrain, mean(pcatrain,1));           
covariancex = (X'*X)./(size(X,1)-1);                 
[V D] = eigs(covariancex, 10);   % reduce to 10 dimension
Xtest = bsxfun(@minus, test, mean(pcatrain,1));  
pcatest = Xtest*V;
1

There are 1 answers

10
lejlot On

You have to reduce both training and test data, but both in the same way. So once you got your reduction matrix from PCA on the training data, you have to use this matrix to reduce dimensionality of the test data. In short words, you need one, constant transformation which is applied to both training and testing elements.

Using your code

% first, 0-mean data
Xtrain = bsxfun(@minus, Xtrain, mean(Xtrain,1));           
Xtest  = bsxfun(@minus, Xtest, mean(Xtrain,1));           

% Compute PCA
covariancex = (Xtrain'*Xtrain)./(size(Xtrain,1)-1);                 
[V D] = eigs(covariancex, 10);   % reduce to 10 dimension

pcatrain = Xtrain*V;
% here you should train your classifier on pcatrain and ytrain (correct labels)

pcatest = Xtest*V;
% here you can test your classifier on pcatest using ytest (compare with correct labels)