I want to execute foo.sh on 2 different nodes. Therefore, I wrote the following script:
#!/home/farago/bin/dash
qsub -N dist -o P -e P-err -V -v
"EXECSCRIPT=foo.sh"
-l walltime=12:00:00,nodes=2:ppn=1 Cluster_ExecExp_pbsdsh.sh
with Cluster_ExecExp_pbsdsh.sh:
#!/home/farago/bin/dash
#PBS -l nodes=2:ppn=1
#PBS -l walltime=12:00:00
/usr/bin/pbsdsh -v dash $EXECSCRIPT
Strangely, foo.sh is always executed on two CPUs of the same node :(
So: Why does pbs(dsh) schedule my task onto one node, even though I have specified nodes=2:ppn=1? (And do I have to give these parameters in both of my scripts?)
Update: if foo.sh consists of
#!/bin/bash
echo "foostart" >> /home/farago/output.txt
cat $PBS_NODEFILE >> /home/farago/output.txt
echo "fooend" >> /home/farago/output.txt
then I get output.txt:
foostart
cn11
cn11
fooend
foostart
cn11
cn11
fooend
So it seems that giving the parameter -l nodes=2:ppn=1 twices results in both qsub and pbsdsh distributing the job twice. But I still do not understand why the jobs are not scheduled on different machines.
It is only being launched on one node because your job is only running on one node. I'm not sure why your scheduler is launching you on only cn11, but the $PBS_NODEFILE tells you what hosts your job is using.
Some schedulers combine your request onto 1 node if possible, even the value for nodes is > 1. This part isn't strange.