Use srun to execute code once, but with multiple tasks

39 views Asked by At

I am currently trying to run some ORCA calculations on a HPC cluster, that does not support ssh. Since ORCA is really write intensive, I want to run it within the memory using the /dev/shm/ location. Instead of using ssh, I have now tried to use srun, but am struggling finding the right set of parameters.

The code structure is basically a loop over the nodes, and a loop over the number of calculations per node, and each one calls:

for  (( j=1; j<=$N_nodes; j++ ))
do
 for (( i=1; i<=$N_calc; i++ ))
 do
  node=$(awk '{print $1}' host_${j}_${C_num})
  Node_number=${node:5}
  export TMP_DIR=/dev/shm/CALC_${i}_${j}_${C_num}
  srun  --overlap --jobid=$SLURM_JOB_ID -N 1 --ntasks=$N_Tasks --cpus-per-task=$CPUs -w nid00$Node_number $SHELL execute.sh $j $i
sleep 20
 done
done

with execute.sh being a simple script that just loads the modules and then executes the loop_i_j_C_num.sh script. Within that script, and ORCA executable that requires 8 cores is started multiple times, until all tasks are done.

However, for the srun command, I am unsure about the right parameters. I tried two versions:

N_Tasks=1, CPUs=8: This results in ORCA not being able to start the calculations, as it requires 8 MPI tasks, but onyl one is allowed.

N_Tasks=8, CPUs=1: This results in the execute.sh program being executed 8 times, instead of just once.

How do I tell srun to start the code once, but with 8 CPUs and tasks?

0

There are 0 answers