episodes = 10
for episode in range(1, episodes+1):
state = env.reset()
done = False
score = 0
while not done:
env.render()
action = random.choice([0,1])
n_state, reward, done, info = env.step(action)
score+=reward
print('Episode:{} Score:{}'.format(episode, score))
The line n_state, reward, done, info = env.step(action) returns this error:
ValueError Traceback (most recent call last)
Cell In[51], line 10
8 env.render()
9 action = random.choice([0,1])
---> 10 n_state, reward, done, info = env.step(action)
11 score+=reward
12 print('Episode:{} Score:{}'.format(episode, score))
ValueError: too many values to unpack (expected 4)
This code appears in a tutorial video and seems to work but always returns this error for me.
import os
import gym
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.evaluation import evaluate_policy
environment_name = "CartPole-v0"
env = gym.make(environment_name)
episodes = 10
for episode in range(1, episodes+1):
state = env.reset()
done = False
score = 0
while not done:
env.render()
action = random.choice([0,1])
n_state, reward, done, info = env.step(action)
score+=reward
print('Episode:{} Score:{}'.format(episode, score))
More recent
gym
versions use a 5-tuple representing the output ofenv.step(action)
, namelystate
,reward
,terminated
,truncated
, andinfo
. Thetruncated
is a boolean that represents unexpected endings of the environment, such as a time limit or a non-existent state. The consequences are the same, the agent-environment loop should end.Thus what you would actually want to do is: