How to get a confidence measure for each prediction in a machine learning model python

654 views Asked by At

I have the following dataframe:

new_df = 

BankNum   | ID    | Labels

0098-7772 | AB123 | High
0098-7772 | ED245 | High
0098-7772 | ED343 | High
0870-7771 | ED200 | Mod
0870-7771 | ED100 | Mod
0098-2123 | GH564 | Low

I am using scikit's SVC to predict the Labels 'High', 'Mod', and 'Low'. I'm doing it as follows:

new_df['BankNum'] = new_df['BankNum'].map(lambda x: x.replace('-',''))
new_df['BankNum'] = new_df.BankNum.astype(np.float128)

columns =['BankNum', 'ID']
le = LabelEncoder()
new_df['ID'] = le.fit_transform(new_df.ID)

new_df['Labels'] = le.fit_transform(new_df.Labels)

X_train, X_test, y_train, y_test = train_test_split(new_df[columns], new_df.Labels, test_size=0.2, random_state=42)

    clf = svm.SVC(gamma=0.001, C=100., probability=True, random_state=42)

    scores = cross_val_score(clf, X_train, y_train, cv=8)
    print "Cross Validation Score: "
    print scores.mean()

    clf.fit(X_train, y_train)

    predicted = clf.predict(X_test)
    print "Accuracy: "
    print(np.mean(predicted == y_test))
    print(metrics.classification_report(y_test, predicted))

I have two questions:

1.) For the classification report I'm getting a output like this:

               precision    recall  f1-score   support

          0       0.00      0.00      0.00      4780
          1       0.94      1.00      0.97    104719
          2       0.00      0.00      0.00      1425

avg / total       0.89      0.94      0.92    110924

Why do label 0 & 2, get 0.00 precision? Can this be because of class imbalance? There are about 80893 High labels, 11798 Mod labels & 279608 Low labels. OR is SVm not a good model for this?

2.) I want to get a confidence score for each prediction. I googled and found something as follows:

p = clf.predict_proba( X_test )
    auc = AUC(y_test, p[:,1] )
    print "SVM AUC", auc

But I'm getting error: raise ValueError("{0} format is not supported".format(y_typeValueError: multiclass format is not supported

How do I get a confidence measure for each prediction and then interpret it as well? Many thanks!!

0

There are 0 answers