How can I implement RandomizedSearchCV for GradientBoostingRegressor in scikit-learn instead of GridSearchCV?

153 views Asked by At

I am trying to run a regression model using sklearn GradientBoostingRegressor. I have seen some GridSearchCV implementations for the hyperparameter tuning, however in order to reduce the computation time I would like to implement RandomizedSearch. Unfortunately I could not make these both run together. Could you please help me how to implement?

My script for GridSearchCV is below, I unfortunately could not manage it to convert to RandomizedSearchCV using gradient boosting estimator.

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import RandomizedSearchCV

print("Optimizing Hyperparameters..")
LR = {"learning_rate": [0.001],
      # "n_estimators": [10, 50, 100, 150, 500, 1000, 15000],
      "n_estimators": [1000, 3000, 5000, 7000, 10000],
      "max_depth": [1, 2, 3, 5, 7, 10]}
tuning = RandomizedSearchCV(estimator=GradientBoostingRegressor(), param_distributions=LR)
tuning.fit(X_train, y_train)

print("Best Parameters found: ", tuning.best_params_)

n_parameter = tuning.best_params_["n_estimators"]
lr_parameter = tuning.best_params_["learning_rate"]
md_parameter = tuning.best_params_["max_depth"]
1

There are 1 answers

3
Tarick Ali On

I ran the following code snippet and everything worked.

import numpy as np
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import RandomizedSearchCV

# basic proof of concept dataset
X_val = np.random.randn(10, 4)
y_val = np.random.randn(10)

LR = {
    "learning_rate": [0.001],
    "n_estimators": [1000, 3000, 5000, 7000, 10000],
    "max_depth": [1, 2, 3, 5, 7, 10],
}
tuning = RandomizedSearchCV(
    estimator=GradientBoostingRegressor(), param_distributions=LR, scoring="r2"
)
tuning.fit(X_val, y_val)
print("Best Parameters found: ", tuning.best_params_)

And it printed out

Best Parameters found:  
{'n_estimators': 3000, 'max_depth': 1, 'learning_rate': 0.001}

Since the code snippet above works, it must be due to something else behind the scenes in your script/notebook. Perhaps it is something to do with your dataset?