My objective is to run multiple reinforcement learning programs, using the Stable_Baselines3 library, at the same time. What I notice is that as I increase the number of programs, the iteration speed of the program gradually decreases, which is quite surprising since each program should be running on a different process (core).
Here is my program:
from joblib import Parallel, delayed
import gym
# from sbx import SAC
import torch
from stable_baselines3 import SAC
def train():
env = gym.make("Humanoid-v4")
model = SAC("MlpPolicy", env, verbose=1)
model.learn(total_timesteps=7e5, progress_bar=True)
def train_model():
train()
if __name__ == '__main__':
num_of_programs = 1
Parallel(n_jobs=10)(delayed(train)() for i in range(num_of_programs))
num_of_programs is used to control the number of programs I am trying to run in parallel.
Here are some statistics -
Number of programs Iteration speed
1 1 ~102 it/s
2 3 ~60 it/s
3 10 ~ 20 it/s
I made sure to request enough resources so that there isn't a resource constraint. This is how I request my resources using slurm - srun --time=10:00:00 --nodes=1 --cpus-per-task=16 --mem=32G --partition=gpu --gres=gpu:a100-pcie:1 --pty /usr/bin/bash
Therefore I have 16 cpus, 32G memory and a 40 GB GPU.
I noticed the same issue when I moved from stable_baselines3 to sbx. While stable_baselines3 using torch as its deep learning library, the latter uses JAX.