Q-Learning Intermediate Rewards

Question

Q-Learning Intermediate Rewards

163 views Asked by Uzay Macar At 04 December 2018 at 23:10

If a Q-Learning agent actually performs noticeably better against opponents in a specific card game when intermediate rewards are included, would this show a flaw in the algorithm or a flaw in its implementation?

Original Q&A

There are 1 answers

**Gracie** · Accepted Answer · 2019-01-18T08:55:57+00:00

It's difficult to answer this question without more specific information about the Q-Learning agent. You might term the seeking of immediate rewards as being the exploitation rate, which is generally inversely proportional to the exploration rate. It should be possible to configure this and the learning rate in your implementation. The other important factor is the choice of exploration strategy and you should not have any difficulty in finding resources that will assist in making this choice. For example:

http://www.ai.rug.nl/~mwiering/GROUP/ARTICLES/Exploration_QLearning.pdf

https://www.cs.mcgill.ca/~vkules/bandits.pdf

To answer the question directly, it may be either a question of implementation, configuration, agent architecture or learning strategy that leads to immediate exploitation and a fixation on local minima.

TechQA.

Q-Learning Intermediate Rewards

There are 1 answers

Related Questions in REINFORCEMENT-LEARNING

Related Questions in Q-LEARNING

Related Questions in REWARD-SYSTEM

Popular Questions

Popular Tags

Trending Questions