I'm implementing a MPI communication structure in Fortran where the master determines for each slave the size of the domain that it will have to work on later on. I'm running the following code on 9 processes (one master and 8 slaves):
if (rank==root) then
! (Some irrelevant calculations here)
! Send information to slaves
do i=1,numslaves
print*, "Sending information to process", i
nrowLocal=nrowLocalArray(i)
mpiTag=1
call MPI_SEND(nrowLocal,1,MPI_INTEGER,i,mpiTag,MPI_COMM_WORLD,ierr)
print*, "ierr send1=", ierr
call MPI_RECV(ctrl,1,MPI_INTEGER,i,mpiTag,MPI_COMM_WORLD,status,ierr)
print*, "ierr recv1=", ierr
firstRow = firstRowArray(i)
mpiTag=2
call MPI_SEND(firstRow,1,MPI_INTEGER,i,mpiTag,MPI_COMM_WORLD,ierr)
print*, "ierr send2=", ierr
call MPI_RECV(ctrl,1,MPI_INTEGER,i,mpiTag,MPI_COMM_WORLD,status,ierr)
print*, "ierr recv2=", ierr
end do
print*, "Master distributed a total of ", nrow, " among ", numslaves, " slaves"
else ! Slaves part
print*, "ready, process", rank
nrowGlobal = nrow
! Get information from master
mpiTag=1
call MPI_RECV(nrowLocal,1,MPI_INTEGER,root,mpiTag,MPI_COMM_WORLD,status,ierr)
call MPI_SEND(rank,1,MPI_INTEGER,root,mpiTag,MPI_COMM_WORLD,ierr)
mpiTag=2
call MPI_RECV(firstRow,1,MPI_INTEGER,root,mpiTag,MPI_COMM_WORLD,status,ierr)
call MPI_SEND(rank,1,MPI_INTEGER,root,mpiTag,MPI_COMM_WORLD,ierr)
nrow = nrowLocal
lastRow = firstRow + nrow
print*, "Process number ", rank, " has a total of ", nrowLocal, " rows, starting at row", firstRow
end if
This code is contained in a subroutine called from another part of the program. All variables are declared in a separate module, use
d here.
My problem is that the code hangs up after the master sent the information to 1 or 2 slaves.
The output from print*,
to the log file tells me the following:
- All slaves print the "ready" message, so it can't be that the program is simply waiting for a process that's not yet initialized.
- The slaves that successfully execute this part of the code carry on and write output from the next part of the program.
- The
ierr
error codes printed are always 0. - The last message printed by the master is
"Sending information to process"
with the number of the slave that would be next in line.
At first, I tried without the MPI_SEND
from the slaves back to the master (and corresponding MPI_RECV
, then inserted them, thinking they might solve the problem (but it didn't change anything).
Also, I tried using MPI_ANY_TAG
in the SEND
and RECV
statements instead of defining the tag myself, but in that case, not even the SEND
is executed.
I've read in some other answers that large buffer size might be a problem, but since I'm sending only one integer at the time, I don't think that's relevant here.
I'm using OpenMPI Intel 1.4.3. on Linux (CentOS 5.5).
Can anyone spot what's wrong here? Any help is greatly appreciated! I might be missing something obvious as I'm quite new to MPI and Fortran.
EDITS:
1) status
is declared in a separate module like this:
integer,allocatable :: status(:)
and allocated after MPI initialization
allocate(status(MPI_STATUS_SIZE))
2) The output looks like this:
ready, process 1
ready, process 7
ready, process 4
ready, process 6
ready, process 3
ready, process 5
ready, process 2
Sending information to process 1
ierr send1= 0
ierr recv1= 0
ierr send2= 0
Process number 1 has a total of 21 rows, starting at row
ierr recv2= 0
1
(Output from the next part of the program from Process 1)
ready, process 8
Sending information to process 2
Note: the 1
on the 4th line from the bottom is a continuation from the 6th line from the bottom.