Convergence of the Q-learning on the inverted pendulum

Question

Convergence of the Q-learning on the inverted pendulum

358 views Asked by Stevy KUIMI At 05 November 2018 at 16:29

Hello I'm working on a total control of the cartpole problem (inverted pendulum). My aim is for the system to reach stability meaning all the states(x, xdot,theta and theta) should converge to zero. I am using q-learning with a reward function as defined below.

Q_table[pre_s + (a,)] += alpha * (R + gamma *(argmax(Q_table[s])) - Q_table[pre_s + (a,)])
R=1000*cos(theta)-1000*(theta_dot**2)-100*(x_dot**2)-100*(x**2)

unfortunately, there is no convergence. By the q-table graph, I can see it increasing and stabilising at the maximum value, but the states just stay within a certain bound and do not go to zero. I feel like my agent is not learning fast enough and at some point i not learning anymore. Can anyone help me.

Original Q&A

There are 1 answers

**R.F. Nelson** · Answer 1 · 2018-11-10T13:17:19+00:00

Assuming you are using an epsilon-greedy approach, your values for alpha and gamma could make a big difference. I suggest playing around with those values and see how that influences your agent.

Additionally, can you explain the logic behind your reward function? It seems unusual to multiply everything by 1000.

TechQA.

Convergence of the Q-learning on the inverted pendulum

There are 1 answers

Related Questions in REINFORCEMENT-LEARNING

Related Questions in Q-LEARNING

Related Questions in CONVERGENCE

Related Questions in REWARD

Popular Questions

Popular Tags

Trending Questions