I have searched google and read the documentation.
My local cluster is using SLURM. I want to check the following things: How many cores does each node have? How many cores has each job in the queue reserved?
Any advice would be much appreciated!
I have searched google and read the documentation.
My local cluster is using SLURM. I want to check the following things: How many cores does each node have? How many cores has each job in the queue reserved?
Any advice would be much appreciated!
You can get most information about the nodes in the cluster with the sinfo
command, for instance with:
sinfo --Node --long
you will get condensed information about, a.o., the partition, node state, number of sockets, cores, threads, memory, disk and features. It is slightly easier to read than the output of scontrol show nodes
.
As for the number of CPUs for each job, see @Sergio Iserte's answer.
See the manpage here.
To build on @damienfrancois's answer:
I found that sinfo
was the most useful, but the command arguments should be different. If you just want to know the cores per node, mem per node, availability, and how much is available per node just do the following.
For quick node status:
sinfo -o "%n %e %m %a %c %C"
Output looks like:
HOSTNAMES FREE_MEM MEMORY AVAIL CPUS CPUS(A/I/O/T)
m-4-06 301585 950000 up 96 88/8/0/96
m-4-07 654944 950000 up 72 71/1/0/72
m-4-09 628696 950000 up 72 49/23/0/72
c-0-02 36741 115000 up 24 24/0/0/24
c-0-03 47512 115000 up 24 24/0/0/24
m-2-01 699025 950000 up 72 72/0/0/72
HOSTNAMES
tells you the nodes of the cluster, if you want submit to a specific node that is the one you can say you want to use.
FREE_MEM
tells you how much memory that node has free in MB.
MEMORY
tells you how much memory that node has by default, when it is unused, in MB.
AVAIL
tells you if that node is up or not (if you are having issues).
CPUS
tells you the total number of cpus on that node, assuming it is unused.
CPUS(A/I/O/T)
tells you the number of allocated/idle/other/total cpus. Allocated cpus are the cores unavailable, and currently being used in jobs. Idle cpus are immediately available for use, other means they could be down or in some different mid-run state, and total just reiterates that total number of cpus.
More details on the output of this command and how to format it can be found here.
in order to see the details of all the nodes you can use:
For an specific node:
And for the cores of job you can use the format mark %C, for instance:
More info about format.