Invalid classes inferred from unique values of `y`. Expected: [0 1 2 3 4 5], got [1 2 3 4 5 6]

50.2k views Asked by At

I've trained dataset using XGB Classifier, but I got this error in local. It worked on Colab and also my friends don't have any problem with same code. I don't know what that error means...

Invalid classes inferred from unique values of y. Expected: [0 1 2 3 4 5], got [1 2 3 4 5 6]

this is my code, but I guess it's not the reason.

start_time = time.time()
xgb = XGBClassifier(n_estimators = 400, learning_rate = 0.1, max_depth = 3)
xgb.fit(X_train.values, y_train)
print('Fit time : ', time.time() - start_time)
9

There are 9 answers

0
Hessah On

it happens because the version of ur xgboost , so :

try this :

y_train_xgb = y_train.map({"1": 0, "2": 1, "3": 2}
0
Yassin El Jakani On

The erros comes with the new version of xgboost, Uninstall current Xgboost and install xgboost 0.90

pip uninstall xgboost 

pip install xgboost==0.90
0
Jefferson Santos On

That happens because the class column has to start from 0 (as required since version 1.3.2). An easy way to solve that is using LabelEncoder from sklearn.preprocssing library.

Solution (works for version 1.6):

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y_train = le.fit_transform(y_train)

And then you try/run your code again:

start_time = time.time()
xgb = XGBClassifier(n_estimators = 400, learning_rate = 0.1, max_depth = 3)
xgb.fit(X_train.values, y_train)
print('Fit time : ', time.time() - start_time)
1
Javier Moreno On

Try to adding stratify to the train_test_split code:

X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size=test_size, stratify = labels)
0
norrey nordine On

Use python version 3.7 as used in colab

0
Craig Rodrigues On

I verified in the source code of xgboost that LabelEncoder() was deprecated in version 1.3 with this PR:

https://github.com/dmlc/xgboost/pull/6269/files

And then LabelEncoder() was removed in version 1.6.0 with this PR: https://github.com/dmlc/xgboost/pull/7357

which was then merged here: https://github.com/dmlc/xgboost/commit/3c4aa9b2ead21d11ef1589059db2ea50208c55ea

The approach mentioned by @jefferson-santos to explicitly use LabelEncoder() is correct, and worked for me.

0
SOMDEB SAR On

It's because the y_train must be encoded in a newer update XGBoost model before training it, i.e., you must use some categorical transformation like label encoders:

from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y_train = le.fit_transform(y_train)

Then apply it to XGBoost model for training:

from xgboost import XGBClassifier
classifier = XGBClassifier()
classifier.fit(X = X_train,y =  y_train)

After training to find out its confusion matrix you must inverse transform the predicted y values, as shown:

from sklearn.metrics import confusion_matrix, accuracy_score
y_pred = classifier.predict(X_test)
y_pred = le.inverse_transform(y_pred)
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)
0
Jatin Kishore Patel On

Downgrading to 1.5.0 worked for me

Also got this warning message during execution

UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release.

using the Label encoder in 1.6 returns this error for me:

MultiClassEvaluation: label must be in [0, num_class), num_class=6 but found 6 in label

1
aps_s On

If it helps, i just rolled back to version 1.2.1