Writing a configurable scalapack linear system solver that prints execution time

124 views Asked by At

I'm trying to adapt the following example program to use as a coarse-grained parallel benchmark in my experiments.

I added the following lines to the code:

      START_TIME = MPI_WTIME() * <- added this
      CALL PDGESV( N, NRHS, MEM( IPA ), 1, 1, DESCA, MEM( IPPIV ),
     $             MEM( IPB ), 1, 1, DESCB, INFO )
*
      IF( MYROW.EQ.0 .AND. MYCOL.EQ.0 ) THEN
         WRITE( NOUT, FMT = * )
         WRITE( NOUT, FMT = * ) 'INFO code returned by PDGESV = ', INFO
         WRITE( NOUT, FMT = * )
         WRITE( NOUT, FMT = * ) 'Matrix X = A^{-1} * B'
         WRITE( NOUT, FMT = * )
      END IF
      CALL PDLAPRNT( N, NRHS, MEM( IPB ), 1, 1, DESCB, 0, 0, 'X', NOUT,
     $               MEM( IPW ) )
      CALL PDLAWRITE( 'SCAEXSOL.dat', N, NRHS, MEM( IPB ), 1, 1, DESCB,
     $                0, 0, MEM( IPW ) )
*
*     Compute residual ||A * X  - B|| / ( ||X|| * ||A|| * eps * N )
      EPS = PDLAMCH( ICTXT, 'Epsilon' )
      ANORM = PDLANGE( 'I', N, N, MEM( IPA ), 1, 1, DESCA, MEM( IPW ) )
      BNORM = PDLANGE( 'I', N, NRHS, MEM( IPB ), 1, 1, DESCB,
     $                 MEM( IPW ) )


      CALL PDGEMM( 'No transpose', 'No transpose', N, NRHS, N, ONE,
     $             MEM( IPACPY ), 1, 1, DESCA, MEM( IPB ), 1, 1, DESCB,
     $             -ONE, MEM( IPX ), 1, 1, DESCX )
      XNORM = PDLANGE( 'I', N, NRHS, MEM( IPX ), 1, 1, DESCX,
     $                 MEM( IPW ) )
      RESID = XNORM / ( ANORM * BNORM * EPS * DBLE( N ) )
      ELAPSED_TIME = MPI_WTIME() - START_TIME * <- added this
*
      IF( MYROW.EQ.0 .AND. MYCOL.EQ.0 ) THEN
         WRITE( NOUT, FMT = * )
         WRITE( NOUT, FMT = * )
     $     '||A * X  - B|| / ( ||X|| * ||A|| * eps * N ) = ', RESID
         WRITE( NOUT, FMT = * )
         IF( RESID.LT.10.0D+0 ) THEN
            WRITE( NOUT, FMT = * ) 'The answer is correct.'
            WRITE( NOUT, FMT = * ) 1000.0*ELAPSED_TIME      * <- added this
         ELSE
            WRITE( NOUT, FMT = * ) 'The answer is suspicious.'
            WRITE( NOUT, FMT = * ) 1000.0*ELAPSED_TIME         * <- added this
         END IF
      END IF

Now the elapsed time that I'm getting does not seem consistent at all - multiple runs result in execution times that are quite different.

I'm running this as a cluster job with qsub. Is there a way to get the execution time through the reservation system, without changing the code?

For my experiments I need a small number of large blocks. When I try to increase the block size in SCAEX.dat:

e.g. from:

'ScaLAPACK Example Program 2'
'May 1997'
'SCAEX.out'             output file name (if any)
400                       device out
400                       value of N
400                       value of NRHS
200                       values of NB
2                       values of NPROW
2                       values of NPCOL

to:

'ScaLAPACK Example Program 2'
'May 1997'
'SCAEX.out'             output file name (if any)
400                       device out
400                       value of N
400                       value of NRHS
400                       values of NB
1                       values of NPROW
1                       values of NPCOL

I get:

Unable to perform test: need TOTMEM of at least    5126408
Bad MEMORY parameters: going on to next test case.
1

There are 1 answers

0
ztik On

You can use shell command time instead. If your executable is a.out use in your shell script.

$ time a.out

Then you can get time from std.out

Also TOTMEM = 2000000 < 5126408 in the source file.