I try to use Stable Baseliens train a PPO2 with MlpPolicy. After 100k timesteps, I can only get 1 and -1 in action. I restrict action space to [-1, 1] and directly use action as control. I don't know if it is because I directly use action as control?
MlpPolicy only return 1 and -1 with action spece[-1,1]
238 views Asked by qwererer2 At
1
There are 1 answers
Related Questions in REINFORCEMENT-LEARNING
- pygame window is not shutting down with env.close()
- Recommended way to use Gymnasium with neural networks to avoid overheads in model.fit and model.predict
- Bellman equation for MRP?
- when I run the code "env = gym.make('LunarLander-v2')" in stable_baselines3 zoo
- Why the reward becomes smaller and smaller, thanks
- `multiprocessing.pool.starmap()` works wrong when I want to write my custom vector env for DRL
- mat1 and mat2 must have the same dtype, but got Byte and Float
- Stable-Baslines3 Type Error in _predict w. custom environment & policy
- is there any way to use RL for decoder only models
- How do I make sure I'm updating the Q-values correctly?
- Handling batch_size in a TorchRL environment
- Application of Welford algorithm to PPO agent training
- Finite horizon SARSA Lambda
- Custom Reinforcement Learning Environment with Neural Network
- Restored Policy gives action that is out of bound with RLlib
Related Questions in OPENAI-GYM
- OpenGL Error [Python [OpenGL] [OpenAI Gym]
- pygame window is not shutting down with env.close()
- `multiprocessing.pool.starmap()` works wrong when I want to write my custom vector env for DRL
- Stable-Baslines3 Type Error in _predict w. custom environment & policy
- gym anytrading python dataframe format
- MLP a2c policy complaining that 0 isn't greater than 0, or infinity isn't greater than 0?
- Custom Reinforcement Learning Environment with Neural Network
- Get frames as observation for CartPole environment
- Problems with replicating an old paper
- How should I set up observation space in gym?
- ValueError: The environment is of type <class '__main__.StockTradingEnvironment'>, not a Gymnasium environment
- Pip can't find ale-py package
- error in Gym eplus environment reset. Could not find platform dependent libraries and kernel always busy
- OpenAI Gym Jupyter Notebook NameError Traceback (most recent call last)
- WSL2 OpenAI Gym - render segmentation fault
Related Questions in POLICY-GRADIENT-DESCENT
- TypeError: tuple indices must be integers or slices, not NoneType
- Attribute error in PPO algorithm for Cartpole gym environment
- Why `ep_rew_mean` much larger than the reward evaluated by the `evaluate_policy()` fuction
- DDPG always choosing the boundaries actions
- Parallel environments in Pong keep ending up in the same state despite random actions being taken
- python policy gradient reinforcement learning with continous action space is not working
- Action masking for continuous action space in reinforcement learning
- PyTorch PPO implementation for Cartpole-v0 getting stuck in local optima
- REINFORCE for Cartpole: Training Unstable
- How to sample actions for a multi-dimensional continuous action space for REINFORCE algorithm
- One back-propagation pass in keras
- DDPG Actor Update ( Pytorch Implementation Issus )
- ValueError: No gradients provided for any variable in policy gradient
- How to clamp output of nueron in pytorch
- DDPG not converging for a simple control problem
Related Questions in STABLE-BASELINES
- Stable-Baslines3 Type Error in _predict w. custom environment & policy
- Problem with PettingZoo and Stable-Baselines3 with a ParallelEnv after training has completed and agents are evaluated
- "Error: the model does not support multiple envs; it requires " "a single vectorized environment."
- My code is throwing error when I use "GrayScaleObservation" from Open AI Gym and "DummyVecEnv" from stable_baselines3.common.vec_env
- Trouble with DummyVecEnv constructor with gym_super_mario_bros
- stable_baselines3 evaluate_policy doens't render window even though render = 'True'
- Using Captum with Stable Baselines
- Handling Non-Serializable Object Attribute Checks in Parallel Processing with Python
- Stable baselines 3 throws ValueError when episode is truncated
- i am unable to install stable_baseline3 in my remote instance in vscode. EnvironmentError is showing up while installing. I am unable to import PPO
- Python, stable-baselines3 observation shape does not match observation space
- Reinforcement Learning agent converging at the lowest reward
- Weights and Biases not logging gradients properly with Stable-Baselines3
- Stabebaseline_contrib producing NAN values during training in MaskablePPO
- Getting a very simple stablebaselines3 example to work
Related Questions in MUJOCO
- Understanding how to make a library recognise that I have the needed dependencies
- Mujoco - Change MjModel instance during runtime in Python
- Assistance with Position-Based Control in MuJoCo
- toroidal-link chain explosion
- Is there an easy way to generate the boiler-plate code of MuJoCo's XML from a .usd (universal scene description) file?
- Mujoco: `gym.error.DependencyNotInstalled: dynamic module does not define module export function (PyInit_cymj)`
- How to include GTK library for MuJoCo simulation in Visual Studio Code (Windows10)
- Position control in Mujoco
- How to downgrade Mujoco version?
- 'mujoco._structs.MjData' object has no attribute 'solver_iter'
- Creating custom XML shortcut in Mujoco
- MuJoCo dm_control: modifying the name of an element imported from an XML file
- Mujoco XML Error: unknown plugin when running simulate building with source code
- Mujoco environment to gymnasium environment
- How to modify the MuJoCo environment (InvertedPendulum)?
Popular Questions
- How do I undo the most recent local commits in Git?
- How can I remove a specific item from an array in JavaScript?
- How do I delete a Git branch locally and remotely?
- Find all files containing a specific text (string) on Linux?
- How do I revert a Git repository to a previous commit?
- How do I create an HTML button that acts like a link?
- How do I check out a remote Git branch?
- How do I force "git pull" to overwrite local files?
- How do I list all files of a directory?
- How to check whether a string contains a substring in JavaScript?
- How do I redirect to another webpage?
- How can I iterate over rows in a Pandas DataFrame?
- How do I convert a String to an int in Java?
- Does Python have a string 'contains' substring method?
- How do I check if a string contains a specific word?
Trending Questions
- UIImageView Frame Doesn't Reflect Constraints
- Is it possible to use adb commands to click on a view by finding its ID?
- How to create a new web character symbol recognizable by html/javascript?
- Why isn't my CSS3 animation smooth in Google Chrome (but very smooth on other browsers)?
- Heap Gives Page Fault
- Connect ffmpeg to Visual Studio 2008
- Both Object- and ValueAnimator jumps when Duration is set above API LvL 24
- How to avoid default initialization of objects in std::vector?
- second argument of the command line arguments in a format other than char** argv or char* argv[]
- How to improve efficiency of algorithm which generates next lexicographic permutation?
- Navigating to the another actvity app getting crash in android
- How to read the particular message format in android and store in sqlite database?
- Resetting inventory status after order is cancelled
- Efficiently compute powers of X in SSE/AVX
- Insert into an external database using ajax and php : POST 500 (Internal Server Error)
This could be the result of the gauß distribution PPO2 is using. You could use a different algorithm that doesn't use gauß or use PPO with another distribution.
Checkout the example here: https://github.com/hill-a/stable-baselines/issues/112 And this paper: https://www.ri.cmu.edu/wp-content/uploads/2017/06/thesis-Chou.pdf