How do I discretise a continuous observation and action space in Python?

301 views Asked by At

My professor has asked me to apply a Policy Iteration method on the Pendulum-V1 gym environment in OpenAI.

Pendulum-V1 has the following Environment:

Observation

Type: Box(3)

Num Observation Min Max
0 cos(theta) -1.0 1.0
1 sin(theta) -1.0 1.0
2 theta dot -8.0 8.0

Actions

Type: Box(1)

Num Observation Min Max
0 Joint effort -2.0 2.0

From my understanding, Policy Iteration requires discrete actions, discrete observations and probability functions, such as the Frozen Lake OpenAI environment. I know that there are methods designed for box type data in a continuous range but the requirement is to apply a "correct" Policy Iteration method and explain why it doesn't work.

Does anyone have a source, know a code repo, or could help me with how I would discretise the action and observation state data and apply it via the Policy Method? Everything I have read has told me this is a bad way to solve this problem and I cannot seem to find anyone who has actually implemented this method on Pendulum-V1.

0

There are 0 answers