When I try to use the DDPG to solve a problem about resource allocation in communication networks, I get an odd result, e.g., the reward becomes smaller and smaller. At the same time, the critic's loss converges to the minimum value and the actor's loss becomes smaller and smaller. The loss functions of the two networks seem to be normal, and I have tried to adjust the hyper parameters and network scale, but the trend of reward is still the same. I would like to know what causes this, thank you for your help. enter image description hereenter image description hereenter image description here
I have tried to adjust the hyper parameters and network scale, but the trend of reward is still the same.