OpenMPI MPI_Send and MPI_Recv structure hanging

659 views Asked by At

I'm implementing a MPI communication structure in Fortran where the master determines for each slave the size of the domain that it will have to work on later on. I'm running the following code on 9 processes (one master and 8 slaves):

if (rank==root) then  

  ! (Some irrelevant calculations here)

  ! Send information to slaves
  do i=1,numslaves
    print*, "Sending information to process", i
    nrowLocal=nrowLocalArray(i)
    mpiTag=1
    call MPI_SEND(nrowLocal,1,MPI_INTEGER,i,mpiTag,MPI_COMM_WORLD,ierr)
    print*, "ierr send1=", ierr
    call MPI_RECV(ctrl,1,MPI_INTEGER,i,mpiTag,MPI_COMM_WORLD,status,ierr)
    print*, "ierr recv1=", ierr

    firstRow = firstRowArray(i)
    mpiTag=2
    call MPI_SEND(firstRow,1,MPI_INTEGER,i,mpiTag,MPI_COMM_WORLD,ierr)
    print*, "ierr send2=", ierr
    call MPI_RECV(ctrl,1,MPI_INTEGER,i,mpiTag,MPI_COMM_WORLD,status,ierr)
    print*, "ierr recv2=", ierr

  end do

  print*, "Master distributed a total of ", nrow, " among ", numslaves, " slaves"

else  ! Slaves part
  print*, "ready, process", rank
  nrowGlobal = nrow

  ! Get information from master
  mpiTag=1
  call MPI_RECV(nrowLocal,1,MPI_INTEGER,root,mpiTag,MPI_COMM_WORLD,status,ierr)
  call MPI_SEND(rank,1,MPI_INTEGER,root,mpiTag,MPI_COMM_WORLD,ierr)

  mpiTag=2
  call MPI_RECV(firstRow,1,MPI_INTEGER,root,mpiTag,MPI_COMM_WORLD,status,ierr)
  call MPI_SEND(rank,1,MPI_INTEGER,root,mpiTag,MPI_COMM_WORLD,ierr)

  nrow = nrowLocal
  lastRow = firstRow + nrow
 print*, "Process number ", rank, " has a total of ", nrowLocal, " rows, starting at row", firstRow

end if

This code is contained in a subroutine called from another part of the program. All variables are declared in a separate module, used here.

My problem is that the code hangs up after the master sent the information to 1 or 2 slaves.

The output from print*,to the log file tells me the following:

  • All slaves print the "ready" message, so it can't be that the program is simply waiting for a process that's not yet initialized.
  • The slaves that successfully execute this part of the code carry on and write output from the next part of the program.
  • The ierr error codes printed are always 0.
  • The last message printed by the master is "Sending information to process" with the number of the slave that would be next in line.

At first, I tried without the MPI_SENDfrom the slaves back to the master (and corresponding MPI_RECV, then inserted them, thinking they might solve the problem (but it didn't change anything).

Also, I tried using MPI_ANY_TAG in the SEND and RECV statements instead of defining the tag myself, but in that case, not even the SEND is executed.

I've read in some other answers that large buffer size might be a problem, but since I'm sending only one integer at the time, I don't think that's relevant here.

I'm using OpenMPI Intel 1.4.3. on Linux (CentOS 5.5).

Can anyone spot what's wrong here? Any help is greatly appreciated! I might be missing something obvious as I'm quite new to MPI and Fortran.


EDITS:

1) status is declared in a separate module like this:

integer,allocatable :: status(:)

and allocated after MPI initialization

allocate(status(MPI_STATUS_SIZE))

2) The output looks like this:

 ready, process          1
 ready, process          7
 ready, process          4
 ready, process          6
 ready, process          3
 ready, process          5
 ready, process          2
 Sending information to process           1
 ierr send1=           0
 ierr recv1=           0
 ierr send2=           0
 Process number            1  has a total of           21  rows, starting at row
 ierr recv2=           0
           1
 (Output from the next part of the program from Process 1)
 ready, process           8
 Sending information to process           2

Note: the 1 on the 4th line from the bottom is a continuation from the 6th line from the bottom.

0

There are 0 answers