A region of my ROC curve is below the random line, how to modify the confusion matrix?

713 views Asked by At

In the following roc curve when the threshold is low the roc_curve goes below the random line why does this happen? The confusion matrix looks like this, my question is which of the following elements (TP,FP,TN,FN) in the confusion matrix should be increased or decreased in order to get the roc_curve over the random line?

ROC Curve

            Predicted
            Neg   Pos

Actual Neg  1656  860
Actual Pos  145  331
1

There are 1 answers

2
Juan Kania-Morales On BEST ANSWER

Preliminaries

predict_proba, called on your model object, returns predicted probability of event=1 for each row of your data.

ROC

When plotting ROC, corresponding method sorts your data with respect to predicted probability of event=1 in descending order. ROC tells you precisely what combinations of TPR and FPR you can achieve with your model predictions.

What this shape of ROC says is:

  1. predicted probability of event=1, generated by your model, matches true probability of event=1 better than random assignment (represented by "random line") for about 85% of your data. Precisely speaking, these 85% are the 85% rows with highest predicted probability of event=1.
  2. predicted probability of event=1, generated by your model, matches true probability of event=1 worse than random assignment (represented by "random line") for about 15% of your data. Precisely speaking, these 15% are the 15% rows with lowest predicted probability of event=1.

I have taken values 85% and 15% from your chart: this is my eyeball estimate of point where ROC crosses the diagonal ("random line") and should be considered illustrative only.

One possible reason for such phenomenon: this might be result of overfitting related to one-hot encoded categorical feature, whose categories "behave" differently for various sub-populations of your data set. You might want to experiment with excluding some of the variables before fitting the model and look for improvements in shape of your ROC.

Confusion Matrix

Confusion matrix results from the same predicted probability of event=1, generated by your model. Single confusion matrix is related to a specific predicted probability threshold for assigning single row either prediction=1 or prediction=0. Single confusion matrix is represented by a single point at ROC, so you can't actually manipulate ROC shape by manipulating confusion matrix elements. You should reason the other way around: ROC shape somehow tells you what confusion matrices are achievable by your model.

Hope this helps :-)