ROC Curve is convex

Question

ROC Curve is convex

1.1k views Asked by GPB At 12 June 2015 at 14:01

I am doing a ROC plot (and AUC calculation) of default frequencies, using logistic regression with one multi-class classifier 'sub_grade.' Assume lcd is a dataframe containing the initial data.

Xtrain, Xtest, ytrain, ytest  =  train_test_split(X,y,test_size=0.50,random_state=123)
# Assign only sub_grade as a feature, Default as response
X = lcd['sub_grade']
y = lcd['Default']

Xtrain, Xtest, ytrain, ytest  =  train_test_split(X,y,test_size=0.50,random_state=123)

logreg = lm.LogisticRegression()
logreg.fit(Xtrain, ytrain)
probas = logreg.predict_proba(Xtest)

# Get classification probabilities from log reg 
y_probas = logreg.predict_proba(Xtest)[:,1]
# Generate ROC Curve from ytest and y_probas
fpr, tpr, thresholds= roc_curve(ytest, y_probas)

The result ROC curve is convex, and the AUC score is ~ 0.35. Why is this? I thought ROC curves order the classification according to frequencies. The outcome would imply that the classes with the highest pct of defaults have the lowest predicted probability of occurring.

Am I interpreting this correctly?

Original Q&A

There are 2 answers

**GPB** · Answer 1 · 2015-06-12T15:50:02+00:00

GPB On 12 June 2015 at 15:50

Update: the issue lay with how I am using the lm classifier. The coefficient changes sign if the order of the feature classifier is reversed. I must not understand this bit.

**dukebody** · Answer 2 · 2016-07-17T11:14:18+00:00

A ROC-AUC score of lower than 0.5 means that your classifier is predicting worse than random, i.e. the patter you learn from the train data is the opposite that is later found in the test data.

This seldom happens, and can be corrected easily by predicting probabilities 1 - current_probability.

Reasons why this might be happening:

The training and the test data patterns differ heavily, or there is no real global pattern.
Your model is overfitting pretty hard.

In your case, since you are using only one feature and therefore overfitting due to too many parameters is unlikely, I guess there is no global correlation between your feature and your target, and therefore you are fitting only noise.

TechQA.

ROC Curve is convex

There are 2 answers

Related Questions in PYTHON

Related Questions in ROC

Related Questions in AUC

Popular Questions

Popular Tags

Trending Questions