I have a python code that uses a function (let's call it integrator_MPI)that is parallelized according to MPI. I am exucuting this code in a HPCC by submitting the job whose relevant lines are :
#!/bin/bash
#SBATCH --job-name=Job # create a short name for your job
#SBATCH --cpus-per-task=12 # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=5G # memory per cpu-core (4G is default)
#SBATCH --time=60:00:00 # total run time limit (HH:MM:SS)
export IMPIPMI_LIBRARY=/usr/lib64/libpmi.so #This is required by the cluster
export IMPIFABRICS=shm:ofa #This is required by the cluster
mpirun -n 12 python3 my_code.py
By doing so the code works without any problem.
But when I modify my python code in order to call the integrator_MPI() function multiples times,
for i in range(Ntimes):
####______ code block that alters the input/data___ ###############
#...
integrator_MPI()
only the first iteration in computed and the code never stops running. This only happens when I try to run it on the cluster. On my laptop the loop works fine.
Should I write the loop in the job file instead or there is a way to turn this around?