I'm using the Python package "Bayesian Optimization" (https://github.com/fmfn/BayesianOptimization) for parameter optimization. By default, the black_box_function
always returns the (single) value to be optimized. In my case, I'm running the optimization on a GCP cluster using Ray and need to return multiple values (like a dataframe etc.) but only one has to be used for the optimization. Unfortunately, the package recognizes all of the returned values of the black_box_function
for optimization. Is there a way to explicitly specify which returned value to use for the optimization?
import pandas as pd
import numpy as np
from bayes_opt import BayesianOptimization
def black_box_function(x, y):
df = pd.DataFrame(
np.random.randint(0, 10, size=(10, 4)), columns=["A", "B", "C", "D"]
). # data I want to store (produced on each node of the cluster)
return -x ** 2 - (y - 1) ** 2 + 1, df # returns multiple values
pbounds = {'x': (2, 4), 'y': (-3, 3)}
optimizer = BayesianOptimization(
f=black_box_function,
pbounds=pbounds,
verbose=2,
random_state=1,
)
optimizer.maximize( # How to tell the maximizer which value to maximize?
init_points=2,
n_iter=3,
)
I have tried to tackle the issue by avoiding multiple return values by
- using
ray.put(df)
to storedf
in the object store from within theblack_box_funtion
, unfortunately, I haven't managed to retrieve the stored information due to the fact that I am not able to get/store the corresponding object references from the nodes. - directly ingest
df
into a BigQuery table from within theblack_box_function
, but this approach seems to run into authorization issues related to parallel data ingestion (ingestion works only for the first node that starts writing data into the BigQuery table while all others do not have access to the table).