Python loop not running in SLURM

153 views Asked by At

I have a python code that uses a function (let's call it integrator_MPI)that is parallelized according to MPI. I am exucuting this code in a HPCC by submitting the job whose relevant lines are :

#!/bin/bash
#SBATCH --job-name=Job        # create a short name for your job
#SBATCH --cpus-per-task=12       # cpu-cores per task (>1 if multi-threaded tasks)
#SBATCH --mem-per-cpu=5G         # memory per cpu-core (4G is default)
#SBATCH --time=60:00:00          # total run time limit (HH:MM:SS)


export IMPIPMI_LIBRARY=/usr/lib64/libpmi.so #This is required by the cluster
export IMPIFABRICS=shm:ofa                  #This is required by the cluster

mpirun -n 12  python3 my_code.py

By doing so the code works without any problem.

But when I modify my python code in order to call the integrator_MPI() function multiples times,

for i in range(Ntimes):
    ####______ code block that alters the input/data___ ###############
    #...
    integrator_MPI()

only the first iteration in computed and the code never stops running. This only happens when I try to run it on the cluster. On my laptop the loop works fine.

Should I write the loop in the job file instead or there is a way to turn this around?

0

There are 0 answers