I see that I have to define players observations for using Qmix + LSTM as here https://github.com/ray-project/ray/issues/8407#issuecomment-627401186 or as in this example https://github.com/ray-project/ray/blob/master/rllib/examples/two_step_game.py#L81
However, I don't understand what I should put into ENV_STATE
.
Is this field for states that player may be in? Are there any restrictions for them? Are they connected with observations (the field that is near) in any way?
ENV_STATE
represents environment state dimension, andobs
represents dimension of observations.However, it will not magically work for any environment. You have to wrap your observations and environment state in dictionary as in this example https://github.com/ray-project/ray/blob/1.11.1/rllib/examples/env/two_step_game.py#L85 so that your environment returns it after every step and on
reset()
.After that, you can use
with_agent_groups
.As you can see from the qmix sources, you can also define action masks in the same dictionary https://github.com/ray-project/ray/blob/1.11.1/rllib/agents/qmix/qmix_policy.py#L93