I am using the PPO algorithm - provided by ray - to train an RL agent to stabilize traffic. During the training process, I keep seeing ValueError('Observation outside expected value range', Box(500,) screenshot
However, I don't know which part of my script is causing this issue or if it is caused by flow at all ?
 
                        
Oof yes that's a very small bug caused by the RLlib upgrade. Basically, the Ray version we used to use wasn't strict about the bounds of the observation space being restricted, but the new version of Ray does. You can fix this by going into the corresponding environment and changing the low and high values of the observation space to be slightly more permissive (say, -2 to 2 instead of the current -1 to 1)