Here is my code. It is a binary classification problem and the evaluation criteria are the AUC score. I have looked at one solution on Stack Overflow and implemented it but did not work and still giving me an error.

param_grid =   {
    'n_estimators' : [1000, 10000],  
    'boosting_type': ['gbdt'],
    'num_leaves': [30, 35],
    #'learning_rate': [0.01, 0.02, 0.05],
    #'colsample_bytree': [0.8, 0.95 ],
    'subsample': [0.8, 0.95],
    'is_unbalance': [True, False],
    #'reg_alpha'  : [0.01, 0.02, 0.05],
    #'reg_lambda' : [0.01, 0.02, 0.05],
    'min_split_gain' :[0.01, 0.02, 0.05]
    }
    
lgb  =  LGBMClassifier(random_state=42, early_stopping_rounds = 10, eval_metric  = 'auc', verbose_eval=20)


grid_search = GridSearchCV(lgb, param_grid= param_grid,
                            scoring='roc_auc', cv=5, n_jobs=-1, verbose=1)

grid_search.fit(X_train, y_train, eval_set = (X_val, y_val))

best_model = grid_search.best_estimator_
start = time()
best_model.fit(X_train, y_train)
Train_time = round(time() - start, 4)

Error happens at best_model.fit(X_train, y_train)

1

There are 1 answers

0
James Lamb On BEST ANSWER

Answer

This error is caused by the fact that you used early stopping during grid search, but decided not to use early stopping when fitting the best model over the full dataset.

Some keyword arguments you pass into LGBMClassifier are added to the params in the model object produced by training, including early_stopping_rounds.

To disable early stopping, you can use update_params().

best_model = grid_search.best_estimator_

# ---------------- my added code -----------------------#
# inspect current parameters
params = best_model.get_params()
print(params)

# remove early_stopping_rounds
params["early_stopping_rounds"] = None
best_model.set_params(**params)
# ------------------------------------------------------#

best_model.fit(X_train, y_train)

More Details

I made some assumptions to turn your question into a minimal reproducible example. In the future, I recommend doing that when you ask questions here. It will help you get better, faster help.

I installed lightgbm 3.1.0 with pip install lightgbm==3.1.0. I'm using Python 3.8.3 on Mac.

Things I changed from your example to make it an easier-to-use reproduction

  • removed commented code
  • cut the number of iterations to [10, 100] and num_leaves to [8, 10] so training would run much faster
  • added imports
  • added a specific dataset and code to produce it repeatably

reproducible example

from lightgbm import LGBMClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import GridSearchCV, train_test_split

param_grid =   {
    'n_estimators' : [10, 100],
    'boosting_type': ['gbdt'],
    'num_leaves': [8, 10],
    'subsample': [0.8, 0.95],
    'is_unbalance': [True, False],
    'min_split_gain' :[0.01, 0.02, 0.05]
}

lgb = LGBMClassifier(
    random_state=42,
    early_stopping_rounds = 10,
    eval_metric  = 'auc',
    verbose_eval=20
)

grid_search = GridSearchCV(
    lgb,
    param_grid= param_grid,
    scoring='roc_auc',
    cv=5,
    n_jobs=-1,
    verbose=1
)

X, y = load_breast_cancer(return_X_y=True)


X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.1,
    random_state=42
)
                                 
grid_search.fit(
    X_train,
    y_train,
    eval_set = (X_test, y_test)
)

best_model = grid_search.best_estimator_

# ---------------- my added code -----------------------#
# inspect current parameters
params = best_model.get_params()
print(params)

# remove early_stopping_rounds
params["early_stopping_rounds"] = None
best_model.set_params(**params)
# ------------------------------------------------------#

best_model.fit(X_train, y_train)