From training score, how to calculate predicted outcomes using predict_proba for a multi-class target variable

Question

From training score, how to calculate predicted outcomes using predict_proba for a multi-class target variable

338 views Asked by SAIF UL MEHDI At 17 October 2020 at 14:56

I'm using a diabetics dataset which has 3 classes for the target variable. I have used Decision Tree Classifier for the same and optimized the hyperparameters using RandomizedSearchCV of sci-kit learn package and fitted the model to training data. Now, I have found the probability values for the test data which gives the probability for assigning the outcome variable to the 3 classes. Now, I want to calculate the cutoff value such that I can use it to assign the classes. For this purpose, I'm using F1 score to find the appropriate cut off value.

Now, I'm stuck how to find the F1 score. Will the F1 score metric help me to find it?

Here is the dataset

After preprocessing the data, I have spitted the data into training and testing set.

dtree = DecisionTreeClassifier()
params = {'class_weight':[None,'balanced'],
              'criterion':['entropy','gini'],
             'max_depth':[None,5,10,15,20,30,50,70],
             'min_samples_leaf':[1,2,5,10,15,20],
             'min_samples_split':[2,5,10,15,20]}
grid_search = RandomizedSearchCV(dtree,cv=10,n_jobs=-1,n_iter=10,scoring='roc_auc_ovr',verbose=20,param_distributions=params)
grid_search.fit(X_train,y_train)
mdl.fit(X_train,y_train)
test_score = mdl.predict_proba(X_test)

The following formula I have created for cutoff for binary classifier -

cutoffs = np.linspace(0.01,0.99,99)
true = y_train
train_score = mdl.predict_proba(X_train)[:,1]
F1_all = []
for cutoff in cutoffs:
    pred = (train_score>cutoff).astype(int)
    TP = ((pred==1)&(true==1)).sum()
    FP = ((pred==1)&(true==0)).sum()
    TN = ((pred==0)&(true==0)).sum()
    FN = ((pred==0)&(true==1)).sum()
    F1 = TP/(TP+0.5*(FP+FN))
    F1_all.append(F1)
my_cutoff = cutoffs[F1_all==max(F1_all)][0]
preds = (test_score1>my_cutoff).astype(int)

Original Q&A

There are 1 answers

**Matus Dubrava** · Answer 1 · 2020-10-18T08:38:38+00:00

There is no cutoff value for the softmax output of a multiclass classifier in the same sense as the cutoff value for binary classifier.

When your output is normalized probabilities for multiple classes and you want to convert this into class labels, you just take the label with the highest assigned probability.

Technically you could design some custom schema such as

if class1 has probability of 10% or more, choose class1 label, otherwise pick a class with the highest assigned probability

which would be sort of a cutoff for class 1 but this is rather arbitrary and I have not seen anyone doing this in practice. If you have some deep insight into your problem which is suggesting that something like this may be useful then go ahead and build your own "cutoff" formula, otherwise you should just stick with the general approach (argmax of the normalized probabilities).

TechQA.

From training score, how to calculate predicted outcomes using predict_proba for a multi-class target variable

There are 1 answers

Related Questions in PYTHON

Related Questions in SCIKIT-LEARN

Related Questions in MULTICLASS-CLASSIFICATION

Popular Questions

Popular Tags

Trending Questions