exploration and exploitation in Q-learning

Question

exploration and exploitation in Q-learning

840 views Asked by user22 At 04 January 2017 at 08:43

In Q-learning algorithm, the selection of an action depends on the current state and the values of the Q-matrix. I want to know if these Q-values are updated only during the exploration step or they change also in the exploitation step.

Original Q&A

There are 1 answers

**Pablo EM** · Accepted Answer · 2017-01-06T15:28:50+00:00

If you read the Q-learning algorithm code, for example from Sutton & Barto book:

It seems pretty clear that Q-values are always updated, independently if the chosen action is exploratory or not.

Notice that line "Choose a from s using policy derived from Q (e.g., epsilon-greedy)" means that the action some times will be exploratory.

TechQA.

exploration and exploitation in Q-learning

There are 1 answers

Related Questions in REINFORCEMENT-LEARNING

Related Questions in Q-LEARNING

Popular Questions

Popular Tags

Trending Questions