mpirun does not distribute job properly

48 views Asked by At

I am trying to use mpirun for a python program and I encountered two problems:

  1. As you can see from the output, the process is not distributed to dirac5 and dirac7, instead it uses the nodes chronologically as written in hostfile.
  2. Each server has maximum 20 cores but if I assign more than 20 cores for the calculation, the logfile is created but it doesn't write any log. e.g dirac4:20 dirac5:20 dirac7:20 and 'mpirun -f hostfile -n 60 python mpi_test.py > mpi_test.log' doesn't write any log. However when I use 'top' command on each nodes, it suggests that calculation is on the run but memory is not being used.

How do I fix these issues ?


I'm executing mpirun as follows:

mpirun -f hostfile -n 15 python mpi_test.py > mpi_test.log

...with the hostfile:

dirac4:5
dirac5:5
dirac7:5

and Python source akin to:

from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm. Get_rank(
size = comm. Get_size(
node = MPI.Get_processor_name()
print(f"Hello from rank {rank} of {size} on node {node}")

...with output:

MPI startup(): I_MPI_PM environment variable is not supported.
MPI startup(): Similar variables:
     I_MPI_PIN
     I_MPI_SHM
     I_MPI_PLATFORM
     I_MPI_PMI
     I_MPI_PMI_LIBRARY
MPI startup(): I_MPI_RANK_CMD environment variable is not supported.
MPI startup(): I_MPI_CMD environment variable is not supported.
MPI startup(): Similar variables:
     I_MPI_CC
MPI startup(): To check the list of supported variables, use the impi_info utility or refer to https://software.intel.com/en-us/mpi-library/documentation/get-started.
Hello from rank 0 of 15 on node dirac4
Hello from rank 1 of 15 on node dirac4
Hello from rank 2 of 15 on node dirac4
Hello from rank 3 of 15 on node dirac4
Hello from rank 4 of 15 on node dirac4
Hello from rank 5 of 15 on node dirac4
Hello from rank 6 of 15 on node dirac4
Hello from rank 7 of 15 on node dirac4
Hello from rank 8 of 15 on node dirac4
Hello from rank 9 of 15 on node dirac4
Hello from rank 10 of 15 on node dirac4
Hello from rank 11 of 15 on node dirac4
Hello from rank 12 of 15 on node dirac4
Hello from rank 13 of 15 on node dirac4
Hello from rank 14 of 15 on node dirac4
0

There are 0 answers