Deep neural network diverges after convergence

1.1k views Asked by At

I implemented the A3C network in https://arxiv.org/abs/1602.01783 in TensorFlow.

At this point I'm 90% sure the algorithm is implemented correctly. However, the network diverges after convergence. See the attached image that I got from a toy example where the maximum episode reward is 7.

When it diverges, policy network starts giving a single action very high probability (>0.9) for most states.

What should I check for this kind of problem? Is there any reference for it?

This is from a toy example, the maximum episode reward is 7.

1

There are 1 answers

0
jaromiru On

Note that in Figure 1 of the original paper the authors say:

For asynchronous methods we average over the best 5 models from 50 experiments.

That can mean that in lot of cases the algorithm does not work that well. From my experience, A3C often diverges, even after convergence. Carefull learning-rate scheduling can help. Or do what the authors did - learn several agents with different seed and pick the one performing the best on your validation data. You could also employ early stopping when validation error becomes to increase.