Using smoothed labels from 0 to 1 to train a XGB classifier

219 views Asked by At

I want to train a XGB classifier using smoothed labels between 0 and 1 instead of binary labels.

The native XGB model seems to be able to accept smoothed labels for a binary classifier.

from xgboost import XGBClassifier
import numpy as np
import xgboost as xgb
train_data = np.random.rand(20, 10)
train_label = np.random.random(20)
dtrain = xgb.DMatrix(train_data, label=train_label)

test_data = np.random.rand(20, 10)
test_label = np.random.random(20)
dtest = xgb.DMatrix(test_data, label=test_label)

param = {'max_depth': 2, 'eta': 1, 'objective': 'binary:logistic', 'eval_metric': 'auc'}
evallist = [(dtrain, 'train'), (dtest, 'eval')]

bst = xgb.train(params=param, dtrain=dtrain, num_boost_round=10, evals=evallist)
[0] train-auc:0.68952   eval-auc:0.53327
[1] train-auc:0.74847   eval-auc:0.49597
[2] train-auc:0.79158   eval-auc:0.45795
...

However, when I tried to use the sklearn wrapper XGBClassifier, I got the following error.


model = XGBClassifier(**param)
model.fit(train_data, train_label)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_12603/1675654556.py in <cell line: 1>()
----> 1 model.fit(train_data, train_label)

~/.pyenv/versions/btc-p2p/lib/python3.9/site-packages/xgboost/core.py in inner_f(*args, **kwargs)
    618             for k, arg in zip(sig.parameters, args):
    619                 kwargs[k] = arg
--> 620             return func(**kwargs)
    621 
    622         return inner_f

~/.pyenv/versions/btc-p2p/lib/python3.9/site-packages/xgboost/sklearn.py in fit(self, X, y, sample_weight, base_margin, eval_set, eval_metric, early_stopping_rounds, verbose, xgb_model, sample_weight_eval_set, base_margin_eval_set, feature_weights, callbacks)
   1464                 or not (self.classes_ == expected_classes).all()
   1465             ):
-> 1466                 raise ValueError(
   1467                     f"Invalid classes inferred from unique values of `y`.  "
   1468                     f"Expected: {expected_classes}, got {self.classes_}"

ValueError: Invalid classes inferred from unique values...

I have 2 questions here:

  1. Does the 1st code example actually take the smoothed labels into account during training or it just internally converts the real values to 0 or 1?
  2. Why doesn't the XGBClassifier method work with smoothed labels? Is it possible to get it work?
1

There are 1 answers

5
Anay On

Answer 1 : In the first code example, train_label and test_label are randomly generated, producing a value between 0 and 1. Hence not smoothened withing the code. XGB internally interpret these labels as 0 and 1 using a sigmoid function.

Answer 2 : XGBClassifier doesn't work with smoothened labels as it expects binary labels for classification tasks.

To convert smoothened labels into binary labels, you can consider pre-processing the labels by using threshold value.

Smoothened to Binary

from xgboost import XGBClassifier
import numpy as np
import xgboost as xgb

train_data = np.random.rand(20, 10)
train_label = np.random.random(20)
train_label_binary = np.where(train_label >= 0.5, 1, 0)  # Apply threshold to convert smoothed labels to binary labels
dtrain = xgb.DMatrix(train_data, label=train_label_binary)

test_data = np.random.rand(20, 10)
test_label = np.random.random(20)
test_label_binary = np.where(test_label >= 0.5, 1, 0)  # Apply threshold to convert smoothed labels to binary labels
dtest = xgb.DMatrix(test_data, label=test_label_binary)

param = {'max_depth': 2, 'eta': 1, 'objective': 'binary:logistic', 'eval_metric': 'auc'}
evallist = [(dtrain, 'train'), (dtest, 'eval')]

bst = xgb.train(params=param, dtrain=dtrain, num_boost_round=10, evals=evallist)

Output:

[0] train-auc:0.80500   eval-auc:0.51000
[1] train-auc:0.93500   eval-auc:0.61500
[2] train-auc:0.95000   eval-auc:0.67500
[3] train-auc:1.00000   eval-auc:0.58000
[4] train-auc:1.00000   eval-auc:0.57500
[5] train-auc:1.00000   eval-auc:0.57500
[6] train-auc:1.00000   eval-auc:0.57500
[7] train-auc:1.00000   eval-auc:0.61500
[8] train-auc:1.00000   eval-auc:0.60000
[9] train-auc:1.00000   eval-auc:0.62000