My professor has asked me to apply a Policy Iteration method on the Pendulum-V1 gym environment in OpenAI.
Pendulum-V1 has the following Environment:
Observation
Type: Box(3)
Num | Observation | Min | Max |
---|---|---|---|
0 | cos(theta) | -1.0 | 1.0 |
1 | sin(theta) | -1.0 | 1.0 |
2 | theta dot | -8.0 | 8.0 |
Actions
Type: Box(1)
Num | Observation | Min | Max |
---|---|---|---|
0 | Joint effort | -2.0 | 2.0 |
From my understanding, Policy Iteration requires discrete actions, discrete observations and probability functions, such as the Frozen Lake OpenAI environment. I know that there are methods designed for box type data in a continuous range but the requirement is to apply a "correct" Policy Iteration method and explain why it doesn't work.
Does anyone have a source, know a code repo, or could help me with how I would discretise the action and observation state data and apply it via the Policy Method? Everything I have read has told me this is a bad way to solve this problem and I cannot seem to find anyone who has actually implemented this method on Pendulum-V1.