I am trying to understand why running the below code for hypterparameter tuning using optuna gives me different best parameter values even if I am running the exact same code with the same random_state = 42
. Where is the random part coming from?
import optuna
import sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
def objective(trial):
digits = sklearn.datasets.load_digits()
x, y = digits.data, digits.target
max_depth = trial.suggest_int("rf_max_depth", 2, 64, log=True)
max_samples = trial.suggest_float("rf_max_samples", 0.2, 1)
rf_model = RandomForestClassifier(
max_depth = max_depth,
max_samples = max_samples,
n_estimators = 50,
random_state = 42
)
score = cross_val_score(rf_model, x, y, cv=3).mean()
return score
study = optuna.create_study(direction = "maximize")
study.optimize(objective, n_trials = 3)
trial = study.best_trial
print("Best Score: ", trial.value)
print("Best Params: ")
for key, value in trial.params.items():
print(" {}: {}".format(key, value))
The issue you're encountering is due to the inherent stochastic nature of the optimization algorithm that Optuna uses, not just the random behavior of the
RandomForestClassifier
.Here's why you see different results even with the same
random_state
(forRandomForestClassifier
!):Random Search in Optuna: Even though the
RandomForest
has a fixedrandom_state
, the search algorithm used by Optuna itself is probabilistic. Optuna uses a combination of random search and Bayesian optimization. Each time you run the study, it can explore the hyperparameter space in a slightly different manner.Sampling in Optuna: The
trial.suggest_*
methods internally use samplers to suggest the next hyperparameters to try. The default sampler in Optuna is AFAIKTPESampler
, which is based on Bayesian optimization and uses past trial results to suggest the next set of hyperparameters. This process has some randomness in it.So to ensure reproducibility, you should set seeds for both Optuna and the algorithms or models you're tuning. In your case, you've set the random seed for
RandomForestClassifier
, but not for Optuna.Try this to make your optimization reproducible: