MlpPolicy only return 1 and -1 with action spece[-1,1]

Question

MlpPolicy only return 1 and -1 with action spece[-1,1]

170 views Asked by qwererer2 At 22 November 2020 at 14:14

I try to use Stable Baseliens train a PPO2 with MlpPolicy. After 100k timesteps, I can only get 1 and -1 in action. I restrict action space to [-1, 1] and directly use action as control. I don't know if it is because I directly use action as control?

Original Q&A

There are 1 answers

**Nico Bohlinger** · Answer 1 · 2021-01-05T21:16:40+00:00

This could be the result of the gauß distribution PPO2 is using. You could use a different algorithm that doesn't use gauß or use PPO with another distribution.

Checkout the example here: https://github.com/hill-a/stable-baselines/issues/112 And this paper: https://www.ri.cmu.edu/wp-content/uploads/2017/06/thesis-Chou.pdf

TechQA.

MlpPolicy only return 1 and -1 with action spece[-1,1]

There are 1 answers

Related Questions in REINFORCEMENT-LEARNING

Related Questions in OPENAI-GYM

Related Questions in POLICY-GRADIENT-DESCENT

Related Questions in STABLE-BASELINES

Related Questions in MUJOCO

Popular Questions

Popular Tags

Trending Questions