I'm running code to train a PPO policy on chess using PettingZoo:
import gym.vector.utils
import supersuit as ss
import stable_baselines3.ppo
import pettingzoo.classic
if __name__ == '__main__':
env = original_env = pettingzoo.classic.chess_v5.env()
env = pettingzoo.utils.turn_based_aec_to_parallel(env)
env = ss.pettingzoo_env_to_vec_env_v1(env)
env = ss.concat_vec_envs_v1(env, 8, num_cpus=4, base_class='stable_baselines3')
model = stable_baselines3.PPO(stable_baselines3.ppo.MultiInputPolicy, env,
tensorboard_log='my_logs')
model.learn(total_timesteps=100)
In the next to last line, you can see I'm outputting logs to TensorBoard, where I hope to see a nice graph. However, all I see is this:
I've used TensorBoard before and it worked. Why isn't it showing any progress now? Or even lack of progress?
Turns out I just needed to use a lower value for
n_steps
.