optuna: Different Results Even With Same random_state

513 views Asked by At

I am trying to understand why running the below code for hypterparameter tuning using optuna gives me different best parameter values even if I am running the exact same code with the same random_state = 42. Where is the random part coming from?

import optuna
import sklearn
from sklearn.ensemble import RandomForestClassifier

from sklearn.model_selection import cross_val_score

def objective(trial):
    digits = sklearn.datasets.load_digits()
    x, y = digits.data, digits.target
    max_depth = trial.suggest_int("rf_max_depth", 2, 64, log=True)
    max_samples = trial.suggest_float("rf_max_samples", 0.2, 1)
   
    rf_model = RandomForestClassifier(
        max_depth = max_depth,
        max_samples = max_samples,
        n_estimators = 50,
        random_state = 42
        )

    score = cross_val_score(rf_model, x, y, cv=3).mean()
    return score
study = optuna.create_study(direction = "maximize")
study.optimize(objective, n_trials = 3)
trial = study.best_trial

print("Best Score: ", trial.value)
print("Best Params: ")
for key, value in trial.params.items():
    print("  {}: {}".format(key, value))
1

There are 1 answers

4
Iskander14yo On BEST ANSWER

The issue you're encountering is due to the inherent stochastic nature of the optimization algorithm that Optuna uses, not just the random behavior of the RandomForestClassifier.

Here's why you see different results even with the same random_state (for RandomForestClassifier!):

  • Random Search in Optuna: Even though the RandomForest has a fixed random_state, the search algorithm used by Optuna itself is probabilistic. Optuna uses a combination of random search and Bayesian optimization. Each time you run the study, it can explore the hyperparameter space in a slightly different manner.

  • Sampling in Optuna: The trial.suggest_* methods internally use samplers to suggest the next hyperparameters to try. The default sampler in Optuna is AFAIK TPESampler, which is based on Bayesian optimization and uses past trial results to suggest the next set of hyperparameters. This process has some randomness in it.

So to ensure reproducibility, you should set seeds for both Optuna and the algorithms or models you're tuning. In your case, you've set the random seed for RandomForestClassifier, but not for Optuna.

Try this to make your optimization reproducible:

sampler = optuna.samplers.TPESampler(seed=42)
study = optuna.create_study(direction="maximize", sampler=sampler)