I am trying to transition from hard-coding RL algorithms myself to using libraries like Stable Baselines 3 and RL-Lib, with environments built using Farama Foundation's Gymnasium and Petting Zoo libraries.
However, many of the environments I am trying to recreate require access to the agent's entire policy function, not just a single sampled action. This is necessary because the state evolution depends not only on the action taken in the current state but also on actions that would be taken in various other (hypothetical) different states.
Is there a way to implement this sort of thing in Gymnasium/Petting Zoo that is compatible with standard RL libraries? In all of the examples I've found, the environment step method takes only a particular action conditional on a single state observation.
Any help would be much appreciated. Thank you!