How discount factor is taken into account in stable baselines 3 on policies methods i.e. PPO?

51 views Asked by At

I would like to understand how gamma have an impact on the learnt policy. I cannot understand if the final reward has a linear or an exponential discount.

I would expect the final reward to be something like

R = sum_i gamma ^ (i) * rew_i

but I cannot find this in the main code. Thank you

0

There are 0 answers