Convergence of the Q-learning on the inverted pendulum

360 views Asked by At

Hello I'm working on a total control of the cartpole problem (inverted pendulum). My aim is for the system to reach stability meaning all the states(x, xdot,theta and theta) should converge to zero. I am using q-learning with a reward function as defined below.

Q_table[pre_s + (a,)] += alpha * (R + gamma *(argmax(Q_table[s])) - Q_table[pre_s + (a,)])
R=1000*cos(theta)-1000*(theta_dot**2)-100*(x_dot**2)-100*(x**2)

unfortunately, there is no convergence. By the q-table graph, I can see it increasing and stabilising at the maximum value, but the states just stay within a certain bound and do not go to zero. I feel like my agent is not learning fast enough and at some point i not learning anymore. Can anyone help me.

1

There are 1 answers

0
R.F. Nelson On

Assuming you are using an epsilon-greedy approach, your values for alpha and gamma could make a big difference. I suggest playing around with those values and see how that influences your agent.

Additionally, can you explain the logic behind your reward function? It seems unusual to multiply everything by 1000.