- As the simplest case, I define an action space to be
spaces.Discrete(3)
, but sometimes, 0 is unavailable, agent only can sample from 1 and 2. And sometimes, 2 unavailable, or, 1 and 2 are unavailable. How can I tell the agent that some choices are not available?
(Note: By unavailable
, I means that this action is impossible, will not happen, and that it's results is undefined; rather than a bad choice which results in a negative reward.)
- In reality, I have
MultiDiscrete
action spaces, and some of the actions sometimes are not available(just as in question 1). Or even worse, actions chosen from those spaces must satisfy some condition, for example, aDiscrete 2 - Discrete 2
MultiDiscrete
action spaces must satisfy a function thatf(a1, a2) <= 1
wherea1
is sampled from the firstDiscrete 2
space, anda2
is sampled from the secondDiscrete 2
space. But thef
here is a complex function which is not as simple as a+
, but a function which related to the current state. If this is the case, how can I tell the agent that some choices are currently unavailable?
Not sure how you can specify that when constructing the action space, but you can sample the action samples with conditions. For your example 1, you can use a
while
loop to keep sampling from the action space, and only return the result if the condition is satisfied.Using the same logic, you can apply this to other action_space to sample with conditions, for example, I have a MultiDiscrete action space that I specify the sum of the array should not be more than 6.