Convergence guarantee of Policy Gradient with function approximation

136 views Asked by arnaud At 18 December 2020 at 11:42

Is there any convergence proof of the Policy Gradient algorithm with "general" value/Q function approximation ? Seminal papers (Sutton1999 & Tsitsiklis1999) prove the theorem using a compatibility assumption (i.e. the Q-function approximation is linear w.r.t to policy's features). Also later improvement such as DPG (Silver14) also have similar assumptions.

Yet in practice this compatibility assumption is not satisfied, policy network and Q-function network have their own, independent, set of parameters.

Hence I wonder to which extend those methods are supported by theoretical guarantees.

Thanks,

(Sutton1999) : Policy gradient methods for reinforcement learning with function approximation, Sutton et al, 1999 (Silver2014) : Deterministic Policy Gradient Algorithms, Silver et al, 2014 (Tsitsiklis1999) : Actor-Critic Algorithms, Tsitsiklis et al, 1999

Original Q&A

TechQA.

Convergence guarantee of Policy Gradient with function approximation

There are 0 answers

Related Questions in REINFORCEMENT-LEARNING

Related Questions in FUNCTION-APPROXIMATION

Related Questions in POLICY-GRADIENT-DESCENT

Popular Questions

Popular Tags

Trending Questions