n_state, reward, done, info = env.step(action) returns value error

Question

n_state, reward, done, info = env.step(action) returns value error

76 views Asked by RFM At 08 November 2023 at 18:47

episodes = 10
for episode in range(1, episodes+1):
    state = env.reset()
    done = False
    score = 0 
    
    while not done:
        env.render()
        action = random.choice([0,1])
        n_state, reward, done, info = env.step(action)
        score+=reward
    print('Episode:{} Score:{}'.format(episode, score))

The line n_state, reward, done, info = env.step(action) returns this error:

ValueError                                Traceback (most recent call last)
Cell In[51], line 10
      8     env.render()
      9     action = random.choice([0,1])
---> 10     n_state, reward, done, info = env.step(action)
     11     score+=reward
     12 print('Episode:{} Score:{}'.format(episode, score))

ValueError: too many values to unpack (expected 4)

This code appears in a tutorial video and seems to work but always returns this error for me.

import os
import gym
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.evaluation import evaluate_policy

environment_name = "CartPole-v0"

env = gym.make(environment_name)

episodes = 10
for episode in range(1, episodes+1):
    state = env.reset()
    done = False
    score = 0 
    
    while not done:
        env.render()
        action = random.choice([0,1])
        n_state, reward, done, info = env.step(action)
        score+=reward
    print('Episode:{} Score:{}'.format(episode, score))

Original Q&A

There are 1 answers

**Lexpj** · Answer 1 · 2023-11-09T09:04:32+00:00

More recent gym versions use a 5-tuple representing the output of env.step(action), namely state, reward, terminated, truncated, and info. The truncated is a boolean that represents unexpected endings of the environment, such as a time limit or a non-existent state. The consequences are the same, the agent-environment loop should end.

Thus what you would actually want to do is:

while not (done or truncated):
    env.render()
    action = random.choice([0,1])
    n_state, reward, done, truncated, info = env.step(action)
    score+=reward
print('Episode:{} Score:{}'.format(episode, score))

TechQA.

n_state, reward, done, info = env.step(action) returns value error

There are 1 answers

Related Questions in PYTHON

Related Questions in MACHINE-LEARNING

Related Questions in REINFORCEMENT-LEARNING

Related Questions in OPENAI-GYM

Related Questions in STABLE-BASELINES

Popular Questions

Popular Tags

Trending Questions