I'm trying to adapt the following example program to use as a coarse-grained parallel benchmark in my experiments.
I added the following lines to the code:
START_TIME = MPI_WTIME() * <- added this
CALL PDGESV( N, NRHS, MEM( IPA ), 1, 1, DESCA, MEM( IPPIV ),
$ MEM( IPB ), 1, 1, DESCB, INFO )
*
IF( MYROW.EQ.0 .AND. MYCOL.EQ.0 ) THEN
WRITE( NOUT, FMT = * )
WRITE( NOUT, FMT = * ) 'INFO code returned by PDGESV = ', INFO
WRITE( NOUT, FMT = * )
WRITE( NOUT, FMT = * ) 'Matrix X = A^{-1} * B'
WRITE( NOUT, FMT = * )
END IF
CALL PDLAPRNT( N, NRHS, MEM( IPB ), 1, 1, DESCB, 0, 0, 'X', NOUT,
$ MEM( IPW ) )
CALL PDLAWRITE( 'SCAEXSOL.dat', N, NRHS, MEM( IPB ), 1, 1, DESCB,
$ 0, 0, MEM( IPW ) )
*
* Compute residual ||A * X - B|| / ( ||X|| * ||A|| * eps * N )
EPS = PDLAMCH( ICTXT, 'Epsilon' )
ANORM = PDLANGE( 'I', N, N, MEM( IPA ), 1, 1, DESCA, MEM( IPW ) )
BNORM = PDLANGE( 'I', N, NRHS, MEM( IPB ), 1, 1, DESCB,
$ MEM( IPW ) )
CALL PDGEMM( 'No transpose', 'No transpose', N, NRHS, N, ONE,
$ MEM( IPACPY ), 1, 1, DESCA, MEM( IPB ), 1, 1, DESCB,
$ -ONE, MEM( IPX ), 1, 1, DESCX )
XNORM = PDLANGE( 'I', N, NRHS, MEM( IPX ), 1, 1, DESCX,
$ MEM( IPW ) )
RESID = XNORM / ( ANORM * BNORM * EPS * DBLE( N ) )
ELAPSED_TIME = MPI_WTIME() - START_TIME * <- added this
*
IF( MYROW.EQ.0 .AND. MYCOL.EQ.0 ) THEN
WRITE( NOUT, FMT = * )
WRITE( NOUT, FMT = * )
$ '||A * X - B|| / ( ||X|| * ||A|| * eps * N ) = ', RESID
WRITE( NOUT, FMT = * )
IF( RESID.LT.10.0D+0 ) THEN
WRITE( NOUT, FMT = * ) 'The answer is correct.'
WRITE( NOUT, FMT = * ) 1000.0*ELAPSED_TIME * <- added this
ELSE
WRITE( NOUT, FMT = * ) 'The answer is suspicious.'
WRITE( NOUT, FMT = * ) 1000.0*ELAPSED_TIME * <- added this
END IF
END IF
Now the elapsed time that I'm getting does not seem consistent at all - multiple runs result in execution times that are quite different.
I'm running this as a cluster job with qsub. Is there a way to get the execution time through the reservation system, without changing the code?
For my experiments I need a small number of large blocks. When I try to increase the block size in SCAEX.dat:
e.g. from:
'ScaLAPACK Example Program 2'
'May 1997'
'SCAEX.out' output file name (if any)
400 device out
400 value of N
400 value of NRHS
200 values of NB
2 values of NPROW
2 values of NPCOL
to:
'ScaLAPACK Example Program 2'
'May 1997'
'SCAEX.out' output file name (if any)
400 device out
400 value of N
400 value of NRHS
400 values of NB
1 values of NPROW
1 values of NPCOL
I get:
Unable to perform test: need TOTMEM of at least 5126408
Bad MEMORY parameters: going on to next test case.
You can use shell command
time
instead. If your executable isa.out
use in your shell script.Then you can get time from
std.out
Also
TOTMEM = 2000000 < 5126408
in the source file.