I am using Stable Baselines 3 to train an agent to play Connect 4 game. I am trying to take the case into account when an agent starts a game as a second player.
self.env = self.ks_env.train([opponent, None])
When I am trying to run the code, I am getting the following error:
invalid multinomial distribution (encountering probability entry < 0)
/opt/conda/lib/python3.7/site-packages/torch/distributions/categorical.py in sample(self, sample_shape)
samples_2d = torch.multinomial(probs_2d, sample_shape.numel(), True).T
However, there is no problem when an agent is first player:
self.env = self.ks_env.train([None, opponent])
I think problem is related to the Pytorch library. My question is how can I fix this issue?
After checking your provided code, the problem doesn't seem to come from what agent starts the game but from not restarting the environment after a game is done.
I just changed your step function as shown:
With this, the model was able to train and you can check that it works as expected with the following snippet:
Link to my version of your notebook