I am doing grid search with Optuna but FAIL trials are not repeated in a second run. Instead, already COMPLETE trials are uselessly repeated.
Here I describe the two problems separately:
- when a trial fails (e.g. lack of computational resources) it is not repeated when launching the grid search (the Python file) a second time. This can be tested with the following self-contained code, in which I simulate a problem by launching an exception. Comment the lines and re-run a second time to see that the combination x=2 and y=2 is not repeated.
import time
import optuna
from optuna.storages import RetryFailedTrialCallback
import numpy as np
def objective(trial):
# get value
params = {
'x': trial.suggest_categorical('x', [0, 1, 2, 3]),
'y': trial.suggest_categorical('y', [0, 1, 2, 3])
}
# print it
print('Testing with x=' + str(params['x']), 'y=' + str(params['y']))
########################################
# COMMENT THIS SECTION AFTER FIRST RUN #
########################################
if params['x'] == 2 and params['y'] == 2:
raise ValueError("x==2, y==2")
########################################
# return
return params['x'] ** 2 - params['y']
def optuna_search_space():
# define search space
return {
'x': range(3),
'y': range(3),
}
def optuna_grid():
# define URL
URL = 'mysql://<USER>:<PASSWORD>@<IP>:<PORT>'
# get search space
search_space = optuna_search_space()
# define storage
storage = optuna.storages.RDBStorage(
url=f"{URL}/prove_optuna",
failed_trial_callback=RetryFailedTrialCallback(max_retry=3),
)
# define study
study = optuna.load_study(
study_name="test1",
sampler = optuna.samplers.GridSampler(search_space),
storage = storage,
)
# run
study.optimize(objective)
# print
print(study.best_trial)
if __name__ == "__main__":
# run
optuna_grid()
- When I re-run the code, it repeats however a trial (or more) that has been already performed. I don't want this, as it is a loss of computational resources.
On the Optuna Dashboard it is possible to see that after several re-runs the combination (x=2, y=2) it is never repeated (even if it failed in the first time), and the combination (x=0, y=1) has been tested several times (uselessly).
How can I solve these problems?
Thank you