What I'm trying to do is to run the same python script on thousands of input csv files. The python script is a single-process script that takes as input a single file and creates an output file with some summaries about the input. I saw the question here, and tried something similar but it did not work (see end).
My SLURM submission script looks like the below, and the python script is nested inside a bash for-loop. I'm using the loop because I have a large file with the names of the input csv files (one per line) and use it to dispatch the python script. The script is contained in a file, submit_script.sh, and it is submitted to the scheduler using sbatch: sbatch submit_script.sh.
I'm also using sem (GNU parallel) to parallelize the python script.
#!/bin/bash
#SBATCH --nodes=8
#SBATCH --ntasks-1024
#SBATCH --ntasks-per-node=128
OUTPUT_DIR='/path/to/output/dir'
INPUT_FILES='/some/path/to/list/of/inputs.txt'
INPUT_DIR='path/to/input/dir'
while IFS=read -r line
FILE_NAME=${INPUT_DIR}/${line}.txt
sem -j -1 --id ${FILE_NAME} python_script.py --input ${FILE_NAME} > ${OUTPUT_DIR}/${OUTPUT_DIR}.txt
done < <(tr -d '\r' < ${INPUT_FILES})
sem --wait
What I was expecting is that with the above SLURM submission, I would get 8 nodes, within which I would be able to create 1024 instances of the python script, and these would be sectioned in buckets of 128 per node? I'm still getting the hang of SLURM, and I apologize for any misunderstandings on my part.
What is happenning is that I do get multiple instances of the script going, but its only about 70, not the 1024 I was expecting.
I tried not using sem and replacing with srun (with no extra parameters), but that did not work — it created multiple copies of the script alright, but all ran with the same input file and the output went into a single file.
What I'm trying to do, is get the SLURM allocation with N nodes, and be able to spawn some number M multiple copies of the python script so that each can tackle a single file. The python script does not use threads and/or processes.
Many thanks in advance, and I really appreciate any help provided!
First, you need to be careful with your usage of
--ntasks-per-node. The directives forsbatchcan be a little bit confusing at first:--nodes-- this is the number of nodes that you are requesting for your job. Note: the number of CPUs per node entirely depends on the cluster you are running on. It could be 1 CPU per node; it could be 4.--ntasks-- this is the total number of tasks that you are requesting for your job.--ntasks-per-node-- if you specify--ntasks, this is the maximum number of tasks that can run on each node. Otherwise, it's the exact number. See the doc: https://slurm.schedmd.com/sbatch.htmlIf you know the number of CPUs per node (such as if you are using a homogeneous cluster), you should just use
--ntasksand--cpus-per-taskinstead. That way you avoid oversubscribing to a node (which will throttle your performance).Second, you want to use
srun, notsem. See here: https://slurm.schedmd.com/srun.htmlThird, your script itself is not quite right. When you run
sbatch, you say: "please run the contents of this script later". SLURM runs it on the first allocated node.When you use
srun, you immediately run a job with the available SLURM resources at your disposal. Within ansbatch, that means you get to pick from the resources that were allocated with directives (#SBATCH ...).sbatchsets up some environment variables to make life easier.When you want to partition stuff, you need to set that up all by yourself. SLURM won't intelligently "figure it out" -- you'll need to explicitly schedule.
I believe the final result needs to look something more like this:
I am not 100% sure about the following line:
but I believe you need something like this. IIRC,
srunwith that syntax should continuously schedule tasks on 1 node, that node being identified by "index"$NODE_NUMBER.This executes on each node using round-robin allocation. There are other ways of allocating the work.
As an aside, you might find job arrays useful for your task: https://slurm.schedmd.com/job_array.html
They are a bit easier to use, but you create many more jobs. When you schedule a job array, you create M jobs instead of 1 job with M tasks.