Using base_score with XGBClassifier to provide initial priors for each target class

749 views Asked by At

When using XGBRegressor, it's possible to use the base_score setting to set the initial prediction value for all data points. Typically that value would be set to the mean of the observed value in the training set.

Is it possible to achieve a similar thing using XGBClassifier, by specifying a value for every target class, when the objective parameter is set to multi:softproba?

E.g. computing the sum of each occurrence for each target class in the training set and normalizing by percentage of total would give us:

class      pct_total
--------------------
blue       0.57
red        0.22
green      0.16
black      0.05

So that when beginning its first iteration, XGBClassifier would start with these per-class values for every data point, instead of simply starting with 1 / num_classes for all classes.

Is it possible to achieve this?

1

There are 1 answers

2
Ben Reiniger On

You can accomplish this using the parameter base_margin. Read about it in the docs; the referenced demo is here, but it uses the native API and DMatrix; as the docs say though, you can set base_margin in the XGBClassifier.fit method (with new enough xgboost).

The shape of base_margin is expected to be (n_samples, n_classes); since xgboost fits multiclass models in a one-vs-rest fashion, you're providing for each sample its base score for each of the three separate GBMs. Note also that these values are in the log-odds space, so transform accordingly. Also don't forget to add base_margin to every prediction call (now that would be nicer as a builtin that would be saved to the class...see again the linked question earlier in this paragraph).