What is the better way to change the percentages of the training and the testing during the splitting process?

49 views Asked by At

With using the PCA technique and the Yale database, I'm trying to work on face recognition within Matlab by randomly splitting the training process to 20% and the testing process to 80%. It is given an

Index in position 2 exceeds array bounds (must not exceed 29)

error. The following is the code, hoping to get help:

dataset = load('yale_FaceDataset.mat');

trainSz = round(dataset.samples*0.2);
testSz = round(dataset.samples*0.8);

trainSetCell = cell(1,trainSz*dataset.classes);
testSetCell = cell(1,testSz*dataset.classes);

j = 1;
k = 1;
m = 1;
for i = 1:dataset.classes
    % training set
    trainSetCell(k:k+trainSz-1) = dataset.images(j:j+trainSz-1);
    trainLabels(k:k+trainSz-1) = dataset.labels(j:j+trainSz-1);
    k = k+trainSz;
    % test set
    testSetCell(m:m+testSz-1) = dataset.images(j+trainSz:j+dataset.samples-1);
    testLabels(m:m+testSz-1) = dataset.labels(j+trainSz:j+dataset.samples-1);
    m = m+testSz;
    j = j+dataset.samples;
end
% convert the data from a cell into a matrix format
numImgs = length(trainSetCell);
trainSet = zeros(numImgs,numel(trainSetCell{1}));
for i = 1:numImgs
    trainSet(i,:) = reshape(trainSetCell{i},[],1);
end
numImgs = length(testSetCell);

testSet = zeros(numImgs,numel(testSetCell{1}));
for i = 1:numImgs
    testSet(i,:) = reshape(testSetCell{i},[],1);
end


%% applying PCA
% compute the mean face
mu = mean(trainSet)';

% centre the training data
trainSet = trainSet - (repmat(mu,1,size(trainSet,1)))';

% generate the eigenfaces(features of the training set)
eigenfaces = pca(trainSet);

% set the number of principal components
Ncomponents = 100;

% Out of the generated components, we keep "Ncomponents"
eigenfaces = eigenfaces(:,1:Ncomponents);

% generate training features
trainFeatures = eigenfaces' * trainSet';

% Subspace projection
% centre features
testSet = testSet - (repmat(mu,1,size(testSet,1)))';

% subspace projection
testFeatures = inv(eigenfaces'*eigenfaces) * eigenfaces' * testSet';

mdl = fitcdiscr(trainFeatures',trainLabels);
labels = predict(mdl,testFeatures');


% find the images that were recognised and their respect. labels
correctRec = find(testLabels == labels');
correctLabels = labels(correctRec);

% find the images that were NOT recognised and their respect. labels
falseRec = find(testLabels ~= labels');
falseLabels = labels(falseRec);


% compute and display the recognition rate
result = length(correctRec)/length(testLabels)*100;
fprintf('The recognition rate is: %0.3f \n',result);

% divide the images into : recognised and unrecognised
correctTest = testSetCell(correctRec);
falseTest = testSetCell(falseRec);

% display some recognised samples and their respective labels
imgshow(correctTest(1:8),correctLabels(1:8));

% display all unrecognised samples and their respective labels
imgshow(falseTest(1:length(falseTest)), falseLabels(1:length(falseTest)));

1

There are 1 answers

0
Max On

it would be nice, if you provide also the line-number and the full message of the error and if you would strip your code to the essential. I guess, the PCA-stuff is not necessary here, as the error is raised probably in your loop. That is because you are incrementing j by j = j+dataset.samples; and take this in the next loop-set for indexing j:j+trainSz-1, which now must exceed dataset.samples...

Nevertheless, there is no randomness in the indexing. It is easiest if you use the built-in cvpartition-function:

% split data 
cvp = cvpartition(Lbl,'HoldOut',.2);
lgTrn = cvp.training;
lgTst = cvp.test;

You may provide the number of classes as first input (Lbl in this case) or the actual class vector to let cvpartition pick random subsets that reflect the original distribution of the individual classes.