LSTM with Keras to optimize a black box function

205 views Asked by At

I'm trying to implement the recurrent neural network architecture proposed in this paper (https://arxiv.org/abs/1611.03824), where the authors use a LSTM to minimize a black-box function (which however is assumed to be differentiable). Here is a diagram of the proposed architecture: RNN. Briefly, the idea is to use an LSTM like an optimizer, which has to learn a good heuristic to propose new parameters for the unknown function y=f(parameters), so that it moves towards a minimum. Here's how the proposed procedure works:

  1. Select an initial value for the parameters p0, and for the function y0 = f(p0)
  2. Call to LSTM cell with input=[p0,y0], and whose output is a new value for the parameters output=p1
  3. Evaluate y1 = f(p1)
  4. Call the LSTM cell with input=[p1,y1], and obtain output=p2
  5. Evaluate y2 = f(p2)
  6. Repeat for few times, for example stopping at fifth iteration: y5 = f(p5).

I'm trying to implement a similar model in Tensorflow/Keras but I'm having some troubles. In particular, this case is different from "standard" ones because we don't have a predefinite time sequence to be analyzed, but instead it is generated online, after each iteration of the LSTM cell. Thus, in this case, our input would consist of just the starting guess [p0,y0=f(p0)] at time t=0. If I understood it correctly, this model is similar to the one-to-many LSTM, but with the difference that the input to the next time step does not come from just the previous cell, but also form the output an additional function (in our case f).

I managed to create a custom tf.keras.layers.Layer which performs the calculation for a single time step (that is it performs the LSTM cell and then use its output as input to the function f):

class my_layer(tf.keras.layers.Layer):
    def __init__(self, units = 4):
        super(my_layer, self).__init__()
        self.cell = tf.keras.layers.LSTMCell(units)

    def call(self, inputs):
        prev_cost = inputs[0]
        prev_params = inputs[1]
        prev_h = inputs[2]
        prev_c = inputs[3]
        
        # Concatenate the previous parameters and previous cost to create new input
        new_input = tf.keras.layers.concatenate([prev_cost, prev_params])
        
        # New parameters obtained by the LSTM cell, along with new internsal states: h and c
        new_params, [new_h, new_c] = self.cell(new_input, states = [prev_h, prev_c])
        
        # Function evaluation
        new_cost = f(new_params)
    
        return [new_cost, new_params, new_h, new_c]

but I do not know how to build the recurrent part. I tried to do it manually, that is doing something like:

my_cell = my_layer(units = 4)

outputs = my_cell(inputs)
outputs1 = my_cell(outputs)
outputs2 = my_cell(outputs1)

Is that correct? Is there some other way to do it more appropriately?

Bonus question: I would like to train the LSTM to be able to optimize not only a single function f, but rather a class of different functions [f1, f2, ...] which share some common structure which make them similar enough to be optimized using the same LSTM. How could I implement such a training loop which takes as inputs a list of this functions [f1, f2, ...], and tries to minimize them all? My first thought was to do that "brute force" way: use a for loop over the function and a tf.GradientTape which evaluates and applies the gradients for each function.

Any help is much appreciated! Thank you very much in advance! :)

0

There are 0 answers