I have been trying to implement policy gradient algorithm in reinforcement learning. However, I am facing the error"ValueError: No gradients provided for any variable:" while computing the gradients for the custom loss function as shown below:
def loss_function(prob, action, reward):
prob_action = np.array([prob.numpy()[0][action]]) #prob is like ->[0.4900, 0.5200] and action is scalar index->1,0
log_prob = tf.math.log(prob_action)
loss = tf.multiply(log_prob, (-reward))
return loss
I am computing the gradients as below:
def update_policy(policy, states, actions, discounted_rewards):
opt = tf.keras.optimizers.SGD(learning_rate=0.1)
for state, reward, action in zip(states, discounted_rewards, actions):
with tf.GradientTape() as tape:
prob = policy(state, training=True)
loss = loss_function(prob, action, reward)
print(loss)
gradients = tape.gradient(loss, policy.trainable_variables)
opt.apply_gradients(zip(gradients, policy.trainable_variables))
Kindly please help me out in this issue. Thank you
As @gekrone indicates in the comment this is definetly due to the gradients not flowing due to prob_action being a numpy array and not a tensor. Also be careful not to use the
.numpy()
method. Probably stick to something likeand this should work.