Optuna in-memory paralellization

48 views Asked by mavex857 At 02 March 2024 at 19:10

I am performing hyperparameter optimization with Optuna (from within rl-zoo) and have some questions about parallelization.

In the docs, it is recommended to use process based (-> distributed) parallelization with a shared storage, which is some local SQL Database in the examples. It also says that the study should be created with the storage argument, meaning to set the storage argument to something that is not None I guess. Setting it to None (which is the default) will set the storage to be in-memory-storage. If we look at the code of create_study, we find the following comment: storage: Database URL. If this argument is set to None, in-memory storage is used, and the :class: ~optuna.study.Study will not be persistent. Does that mean that the parallelization plainly not works with in-memory storage? Or just something like that the Study won't be available to inspect after it's done, as if it would be saved in a database? So that it kinda vanishes after it's done. I am working on an HPC Cluster and have trouble setting up the SQL Server due to permissions, that's why I am asking.
Alternatively one can parallelize thread based with --n-jobs > 1, does that conflict in any way with the distributed approach or can I just combine the two as I want?

TechQA.