Does initializing a Q-table with zeros introduce bias towards the first action in reinforcement learning?

76 views Asked by At

I'm working on a reinforcement learning problem where I've initialised the Q-table with zeros. I noticed that when all Q-values for different actions are initially set to zero, the arg-max function tends to select the first action when multiple actions have the same Q-value.

Doesn't this create a potential bias towards the first action? Is it a bad idea to initialise Q Table with zeroes. If so what else should be done?

I tried initialising Q Table with zeroes and it works. However where the Q values are still zeroes, the first action is selected. For example, in the goal state, all values would be zero and the first action is selected. I was under the impression that any action is acceptable in the goal state, but I have found different behaviour for other actions. If I initialise it with values other than zero wouldn't it still give the same result?

1

There are 1 answers

1
proof-of-correctness On

Yes, you're right, it will introduce some bias.

The algorithm will converge to the optimal solution regardless of what initial values we start with, so that won't be much of a problem in the long run.

It will affect the rate of convergence, though. As an example, if we set the initial values high, the agent will be incentivized to explore more which will slow down the rate of convergence.

In general, intialising values to 0 should be fine, but I'll let someone else give a more detailed response for which values would be pragmatically better (and in what situations).