mpirun, Python, and task mapping

243 views Asked by At

I have to use two tools together in a SLURM system with many cores, in a Python wrapper.

  • The first one is complex and cannot be changed. It is spawn in Python directly, and uses task per node used, and each task uses as many CPUs as there are in each node. Example: 4 nodes of 20 cores, the command to run is mpirun -np 4 -oversubscribe --bind-to none python run_solver1.py.
  • The second one (OpenFOAM) is usually spawned by itself with mpirun and uses as many tasks are there are CPUs in total. The command for the previous configuration would be: mpirun -np 80 solver2 -parallel, then the solver 2 would itself handle the parallel tasks run.

What I need is to still use the same syntax as the first case, but also include the second solver in the python script: mpirun -np 4 -oversubscribe --bind-to none python run_solvers.py.

I have been using subprocess to spawn the task, but if I do:

  • subprocess.check_call('solver2', '-parallel']), with this configuration, I get only 4 tasks spawned instead of 80.
  • subprocess.check_call('mpirun -np 80 solver2 -parallel'.split()) with this configuration, subprocess returns an error
  • Same if I add -oversubscribe --bind-to none to the call.

I also tried with mpi4py:

  • MPI.COMM_SELF.Spawn('solver2', args=['-parallel'], maxprocs=80) yields mpi4py.MPI.Exception: MPI_ERR_SPAWN: could not spawn processes.
  • Same for MPI.COMM_WORLD.Spawn('solver2', args=['-parallel'], maxprocs=80).

Is there any way for me to make it work, and make Python understand that I need to spawn this one command with 80 processors?

Thanks!

1

There are 1 answers

1
Im Okay At life On

Yes, it is possible to make it work. You can use the Python mpi4py module to spawn processes with the correct number of processors. The following code should work:

from mpi4py import MPI

comm = MPI.COMM_SELF.Spawn('solver2', args=['-parallel'], maxprocs=80)
comm.Disconnect()

This code will spawn 80 instances of the solver2 process with the given argument.