Parallel environments in Pong keep ending up in the same state despite random actions being taken

205 views Asked by At

Hi I am trying to use the SubprocVecEnv to run 8 parallel Pong environment instances. I tried testing the state transitions using random actions but after 15 steps (with random left or right actions), the states of all the environments are the same. I was wondering how this happened and whether I did something wrong? Shouldn't all the different environment states be different? I checked the actions taken and net net they are all different (i.e. after 15 steps, some of the agents have taken more left than right actions and vice versa).

Can someone help on why all environments end at the same state even after 15 steps of random actions? My problem is that there is no new learning between environments if they all follow the same trajectory? Thanks a lot!

from baselines.common.vec_env.subproc_vec_env import SubprocVecEnv
env_name='PongDeterministic-v4' 

def make_env(env_name, seed):
    def f_():
        env=gym.make(env_name)
        env.seed(seed)
        return env
    return f_

envs=[make_env(env_name,42) for _ in range(8)]
envs = SubprocVecEnv(envs)
   
envs.reset()
for _ in range(15):
    fr1, _, _, _ = envs.step(np.random.choice([4, 5],8))
base=fr1[0,:,:,:]
for i in range(fr1.shape[0]):
    if fr1[i,:,:,:].all()==base.all():
       print('Match :(')

Match :(
Match :(
Match :(
Match :(
Match :(
Match :(
Match :(
Match :(

1

There are 1 answers

0
Swami On

Ok, figured it out. I was using the same seed for all the envs.

envs=[make_env(env_name,42) for _ in range(8)]

should be changed to

envs=[make_env(env_name,i) for i in range(8)] #Seed as some function of i