Pytorch - RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

Question

Pytorch - RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

1.5k views Asked by Joe Rakhimov At 20 October 2020 at 08:53

I am using Stable Baselines 3 to train an agent to play Connect 4 game. I am trying to take the case into account when an agent starts a game as a second player.

self.env = self.ks_env.train([opponent, None])

When I am trying to run the code, I am getting the following error:

invalid multinomial distribution (encountering probability entry < 0)
/opt/conda/lib/python3.7/site-packages/torch/distributions/categorical.py in sample(self, sample_shape)
samples_2d = torch.multinomial(probs_2d, sample_shape.numel(), True).T

However, there is no problem when an agent is first player:

self.env = self.ks_env.train([None, opponent])

I think problem is related to the Pytorch library. My question is how can I fix this issue?

Original Q&A

There are 1 answers

**Heladio Amaya** · Accepted Answer · 2020-11-03T06:17:32+00:00

After checking your provided code, the problem doesn't seem to come from what agent starts the game but from not restarting the environment after a game is done.

I just changed your step function as shown:

def step(self, action):
    # Check if agent's move is valid
    is_valid = (self.obs['board'][int(action)] == 0)
    if is_valid:  # Play the move
        self.obs, old_reward, done, _ = self.env.step(int(action))
        reward = self.change_reward(old_reward, done)
    else:  # End the game and penalize agent
        reward, done, _ = -10, True, {}
    if done:
        self.reset()
    return board_flip(self.obs.mark,
                      np.array(self.obs['board']).reshape(1, self.rows, self.columns) / 2),
                      reward, done, _

With this, the model was able to train and you can check that it works as expected with the following snippet:

done = True
for step in range(500):
    if done:
        state = env.reset()
    state, reward, done, info = env.step(env.action_space.sample())
    print(reward)

Link to my version of your notebook

TechQA.

Pytorch - RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

There are 1 answers

Related Questions in PYTORCH

Related Questions in REINFORCEMENT-LEARNING

Related Questions in STABLE-BASELINES

Popular Questions

Popular Tags

Trending Questions