Game-like model in Q-learning

77 views Asked by At

I have a modeling question. I am sorry I am new to reinforcement learning.

Suppose we have a game in the style pacman. the agent has access to left-front, center-front, right-front circles and must eat dots it will encounter. (if it skips there is more penalty.) dots would appear randomly but have different weigths: either positive or negative. I want to find an optimal score (summed from weigths of the dots) and/or optimal length of dots it will encounter in chain where it would score positive.

I want to train a Q-learning model for this (though I doubt it is the correct way). I plan next using policy-based iteration because value-based model gave me a rather linear solution in a stochastic state space (only one decision per state where it can alter).

  • I don't know if theoretically this question is solvable.

  • The dots appear on the fly in random circle next to the agent. say, the "next states" [+/-1,0,0], [0,+/-1,0], [0,0,+/-1] have equal probability distribution. I have trouble posing question the rigth way and to fix a terminal state.

Can you guide me?

0

There are 0 answers