How to use the optimizer in tensorflow2 correctly?

455 views Asked by At

Im asking myself does the following code do only one step of gradient descent or does it do the whole gradient descent algorithm?

opt = tf.keras.optimizers.SGD(learning_rate=self.learning_rate)   
opt = tf.keras.optimizers.SGD(learning_rate=self.learning_rate)   
train = opt.minimize(self.loss, var_list=[self.W1, self.b1, self.W2, self.b2, self.W3, self.b3])

You need to do a number of steps in gradient descent which you determine. But Im not sure if opt.minimize(self.loss, var_list=[self.W1, self.b1, self.W2, self.b2, self.W3, self.b3]) is doing all steps instead of doing one step of gradient descent. Why do I think it does all steps? Because my loss is zero after that.

1

There are 1 answers

0
Sascha Kirch On

tf.keras.optimizers.Optimizer.minimize() calculates the gradients and applies them. Hence, it's a single step.

In the documentation of this function you can read:

This method simply computes gradient using tf.GradientTape and calls apply_gradients(). If you want to process the gradient before applying then call tf.GradientTape and apply_gradients() explicitly instead of using this function.

Which can also be seen from the implementation of minimize():

  def minimize(self, loss, var_list, grad_loss=None, name=None, tape=None):
    """Minimize `loss` by updating `var_list`.
    This method simply computes gradient using `tf.GradientTape` and calls
    `apply_gradients()`. If you want to process the gradient before applying
    then call `tf.GradientTape` and `apply_gradients()` explicitly instead
    of using this function.
    Args:
      loss: `Tensor` or callable. If a callable, `loss` should take no arguments
        and return the value to minimize. If a `Tensor`, the `tape` argument
        must be passed.
      var_list: list or tuple of `Variable` objects to update to minimize
        `loss`, or a callable returning the list or tuple of `Variable` objects.
        Use callable when the variable list would otherwise be incomplete before
        `minimize` since the variables are created at the first time `loss` is
        called.
      grad_loss: (Optional). A `Tensor` holding the gradient computed for
        `loss`.
      name: (Optional) str. Name for the returned operation.
      tape: (Optional) `tf.GradientTape`. If `loss` is provided as a `Tensor`,
        the tape that computed the `loss` must be provided.
    Returns:
      An `Operation` that updates the variables in `var_list`. The `iterations`
      will be automatically increased by 1.
    Raises:
      ValueError: If some of the variables are not `Variable` objects.
    """
    grads_and_vars = self._compute_gradients(
        loss, var_list=var_list, grad_loss=grad_loss, tape=tape)
    return self.apply_gradients(grads_and_vars, name=name)