I want to create a parallel program, which makes heavy use of SCALAPACK. The basis of SCALAPACK is BLACS, which itself relies on MPI for interprocess communication.
I want to start the program with a defined number of processes (e.g. the number of cores on the machine) and let the algorithm decide, how to use these processes for calculations.
As a testcase I wanted to use 10 processes. 9 of these processes should get arranged in a grid (BLACS_GRIDINIT
) and the 10th process should wait till the other processes are finished.
Unfortunately, OpenMPI crashes because the last process doesn't get into a MPI context from BLACS, while the others did.
Question: What is the correct way to use BLACS with more processes than needed?
I did some experiments with additional MPI_INIT
and MPI_FINALIZE
calls, but none of my tries were successful.
I started with the sample code from Intel MKL (shortened a little bit):
PROGRAM HELLO
* -- BLACS example code --
* Written by Clint Whaley 7/26/94
* Performs a simple check-in type hello world
* ..
* .. External Functions ..
INTEGER BLACS_PNUM
EXTERNAL BLACS_PNUM
* ..
* .. Variable Declaration ..
INTEGER CONTXT, IAM, NPROCS, NPROW, NPCOL, MYPROW, MYPCOL
INTEGER ICALLER, I, J, HISROW, HISCOL
* Determine my process number and the number of processes in
* machine
CALL BLACS_PINFO(IAM, NPROCS)
* Set up process grid that is as close to square as possible
NPROW = INT( SQRT( REAL(NPROCS) ) )
NPCOL = NPROCS / NPROW
* Get default system context, and define grid
CALL BLACS_GET(0, 0, CONTXT)
CALL BLACS_GRIDINIT(CONTXT, 'Row', NPROW, NPCOL)
CALL BLACS_GRIDINFO(CONTXT, NPROW, NPCOL, MYPROW, MYPCOL)
* If I'm not in grid, go to end of program
IF ( (MYPROW.GE.NPROW) .OR. (MYPCOL.GE.NPCOL) ) GOTO 30
* Get my process ID from my grid coordinates
ICALLER = BLACS_PNUM(CONTXT, MYPROW, MYPCOL)
* If I am process {0,0}, receive check-in messages from
* all nodes
IF ( (MYPROW.EQ.0) .AND. (MYPCOL.EQ.0) ) THEN
WRITE(*,*) ' '
DO 20 I = 0, NPROW-1
DO 10 J = 0, NPCOL-1
IF ( (I.NE.0) .OR. (J.NE.0) ) THEN
CALL IGERV2D(CONTXT, 1, 1, ICALLER, 1, I, J)
END IF
* Make sure ICALLER is where we think in process grid
CALL BLACS_PCOORD(CONTXT, ICALLER, HISROW, HISCOL)
IF ( (HISROW.NE.I) .OR. (HISCOL.NE.J) ) THEN
WRITE(*,*) 'Grid error! Halting . . .'
STOP
END IF
WRITE(*, 3000) I, J, ICALLER
10 CONTINUE
20 CONTINUE
WRITE(*,*) ' '
WRITE(*,*) 'All processes checked in. Run finished.'
* All processes but {0,0} send process ID as a check-in
ELSE
CALL IGESD2D(CONTXT, 1, 1, ICALLER, 1, 0, 0)
END IF
30 CONTINUE
CALL BLACS_EXIT(0)
1000 FORMAT('How many processes in machine?')
2000 FORMAT(I)
3000 FORMAT('Process {',i2,',',i2,'} (node number =',I,
$ ') has checked in.')
STOP
END
Update: I investigated the source code of BLACS
to see, what happens there.
The call BLACS_PINFO
initializes the MPI context with MPI_INIT
, if this didn't happen before. This means, that at this point, everything works as expected.
At the end, the call to BLACS_EXIT(0)
should free all resources from BLACS
and if the argument is 0
, it should also call MPI_FINALIZE
. Unfortunately, this doesn't work as expected and my last process doesn't call MPI_FINALIZE
.
As a workaround, one could ask MPI_FINALIZED
and call MPI_FINALIZE
if necessary.
Update 2: My previous tries were done with Intel Studio 2013.0.079
and OpenMPI 1.6.2
on SUSE Linux Enterprise Server 11
.
After reading ctheo's answer, I tried to compile this example with the tools given by Ubuntu 12.04
(gfortran 4.6.3, OpenMPI 1.4.3, BLACS 1.1
) and was successful.
My conclusion is, that Intel's implementation appears to be buggy. I will retry this example in the not so far future with the newest service release of Intel Studio
, but don't expect any changes.
However, I would appreciate any other (and maybe better) solution.
I don't know the answer, and I would hazard a guess that the set of people that participate in SO and those who know the answer to your question is < 1. However, I'd suggest that you might have slightly better luck asking on scicomp or by contacting the ScaLAPACK team at the University of Tennessee directly through their support page. Good luck!