Cross validation and ROC curve using Matlab: how plot mean ROC curve?

622 views Asked by At

I am using k-fold cross validation with k = 10. Thus, I have 10 ROC curves. I would like to average between the curves. I can't just average the values ​​on the Y axes (using perfcurve) because the vectors returned are not the same size.

[X1,Y1,T1,AUC1] = perfcurve(t_test(1),resp(1),1);
.
.
.
[X10,Y10,T10,AUC10] = perfcurve(t_test(10),resp(10),1);

How to solve this? How can I plot the average curve of the 10 ROC curves?

2

There are 2 answers

0
Antonio Mendes On

I solved it using Matlab's perfcurve. For that, I had to pass as a parameter a list of vectors (size vectors 1xn) for "label" and "scores". Thus, the perfcurve function already understands as a set of resolutions made using k-fold and returns the average ROC curve and its confidence interval, in addition to the AUC and its confidence interval.

[X1,Y1,T1,AUC1] = perfcurve(t_test_list,resp_list,1);

t_test and resp they are lists of size 1xk (k is the number of folds / k-fold) and each element of the lists is a 1xn vector with scores and labels.

resp = nnet(x_test(i));
t_test_act = t_test(i); 

resp has 2xn format (n is the number of predicted samples). There are two classes.

t_test_act contains the labels of the current set of tests, it has formed 2xn and is composed of 0 and 1 (each column has a 1 and a 0, indicating the true class of the sample).

resp_list{i} = resp(1,:)  %(scores)
t_test_list{i} = t_test_act(1,:) %(labels)
[X1,Y1,T1,AUC1] = perfcurve(t_test_list,resp_list,1);
1
saastn On

So, you have k curves with different number of points, all bound in [0..1] interval in both dimensions. First, you need to calculate interpolated values for each curve at specified query points. Now you have new curves with fixed number of points and can compute their mean. The interp1 function will do the interpolation part.

%% generating sample data
k = 10;
X = cell(k, 1);
Y = cell(k, 1);
hold on;
for i=1:k
    n = 10+randi(10);
    X{i} = sort([0 1 rand(1, n)]);
    Y{i} = sort([0 1 rand(1, n)].^.5);
end

%% Calculating interpolations
% location of query points
X2 = linspace(0, 1, 50);
n = numel(X2);
% initializing values for different curves at different query points
Y2 = zeros(k, n);
for i=1:k
    % finding interpolated values for i-th curve
    Y2(i, :) = interp1(X{i}, Y{i}, X2);
end
% finding the mean
meanY = mean(Y2, 1);

enter image description here

Notice that different interpolation methods can affect your results. For example, the ROC plot data are kind of stairs data. To find the exact values on such curves, you should use the Previous Neighbor Interpolation method, instead of the Linear Interpolation which is the default method of interp1:

Y2(i, :) = interp1(X{i}, Y{i}, X2); % linear
Y3(i, :) = interp1(X{i}, Y{i}, X2, 'previous');

enter image description here

This is how it affects the final results:

enter image description here