How to use the Bayesian-optimization package with multiple return values?

520 views Asked by At

I'm using the Python package "Bayesian Optimization" (https://github.com/fmfn/BayesianOptimization) for parameter optimization. By default, the black_box_function always returns the (single) value to be optimized. In my case, I'm running the optimization on a GCP cluster using Ray and need to return multiple values (like a dataframe etc.) but only one has to be used for the optimization. Unfortunately, the package recognizes all of the returned values of the black_box_function for optimization. Is there a way to explicitly specify which returned value to use for the optimization?

import pandas as pd
import numpy as np
from bayes_opt import BayesianOptimization

def black_box_function(x, y):
    df = pd.DataFrame(
        np.random.randint(0, 10, size=(10, 4)), columns=["A", "B", "C", "D"]
    ). # data I want to store (produced on each node of the cluster)
    return -x ** 2 - (y - 1) ** 2 + 1, df  # returns multiple values


pbounds = {'x': (2, 4), 'y': (-3, 3)}

optimizer = BayesianOptimization(
    f=black_box_function,
    pbounds=pbounds,
    verbose=2, 
    random_state=1,
)

optimizer.maximize(  # How to tell the maximizer which value to maximize?
    init_points=2,
    n_iter=3,
)

I have tried to tackle the issue by avoiding multiple return values by

  • using ray.put(df) to store df in the object store from within the black_box_funtion, unfortunately, I haven't managed to retrieve the stored information due to the fact that I am not able to get/store the corresponding object references from the nodes.
  • directly ingest df into a BigQuery table from within the black_box_function, but this approach seems to run into authorization issues related to parallel data ingestion (ingestion works only for the first node that starts writing data into the BigQuery table while all others do not have access to the table).
0

There are 0 answers