Problems Initialising a grid using Cblacs in MPI

102 views Asked by At

I'm trying to set up a very simple 1 * 2 grid using the following code:


        int nprow, npcol, myrow, mycol, myid;
        char rowcol[1] = "R";

        nprow = 1;
        npcol = size / nprow;
        if(npcol * nprow != size){
                printf("Error");
                MPI_Finalize();
                exit(1);
        }

        Cblacs_pinfo(&myid, &size);
        Cblacs_get(0, 0, &ictxt);
        Cblacs_gridinit(&ictxt, rowcol, nprow, npcol);
        Cblacs_pcoord(ictxt, myid, &myrow, &mycol);

        printf("rank = %d, nprow = %d, npcol = %d, myrow = %d, mycol = %d\n", rank, nprow, npcol, myrow, mycol);                                                                    }

The problem is that the Cblacs_pcoord function seems to be changing nprow to 0 no matter what it is initailly set to and this, in turn, gives 0 for every myrow while the npcol and mycol variables are always correct for any number of processors used. I am very confused since this function shouldn't touch nprow but I've printed nprow after every line of code and it is the correct value until after that function is called.

If I'm missing any information that would help you answer my question please let me know and I will update accordingly.

1

There are 1 answers

0
Mark Gates On

Short answer: Use Cblacs_gridinfo instead of Cblacs_pcoord.

Long answer: Please give a complete, minimum example. I filled in a bunch of details to compile something.

Use const char* rowcol = "R"; instead of char rowcol[1] = "R";, which generates a compile error:

blacs.cc:22:22: error: initializer-string for char array is too long
    char rowcol[1] = "R";
                     ^~~

Declaring rowcol as a 1-character array could cause stack corruption, since "R" is 2 characters, including the terminating null char.

I'm not sure what output you got that was wrong. Your code seems to work for me. But I still recommend using gridinfo instead of pcoord, because that's what ScaLAPACK does everywhere, and I think pcoord is broken for col-major grids.

Here's complete code.

#include <mpi.h>
#include <stdlib.h>

extern "C" void Cblacs_pinfo( int* mypnum, int* nprocs );
extern "C" void Cblacs_get( int context, int request, int* value );
extern "C" int  Cblacs_gridinit( int* context, const char* order, int np_row, int np_col );
extern "C" void Cblacs_gridinfo( int context, int*  np_row, int* np_col, int*  my_row, int*  my_col );
extern "C" void Cblacs_gridexit( int context );
extern "C" void Cblacs_exit( int error_code );
extern "C" void Cblacs_abort( int context, int error_code );
extern "C" void Cblacs_abort( int context, int error_code );
extern "C" void Cblacs_pcoord( int icontxt, int pnum, int* prow, int* pcol );

int main( int argc, char** argv )
{
    MPI_Init( &argc, &argv );

    int rank;
    MPI_Comm_rank( MPI_COMM_WORLD, &rank );

    int nprow, npcol, myrow, mycol, myid;
    //char rowcol[1] = "R";
    const char* rowcol = "R";

    int size;
    Cblacs_pinfo( &myid, &size );

    nprow = 1;
    npcol = size / nprow;
    if (npcol * nprow != size) {
        printf( "Error" );
        MPI_Finalize();
        exit( 1 );
    }

    int ictxt;
    Cblacs_get( 0, 0, &ictxt );
    Cblacs_gridinit( &ictxt, rowcol, nprow, npcol );
    //Cblacs_pcoord( ictxt, myid, &myrow, &mycol );
    Cblacs_gridinfo( ictxt, &nprow, &npcol, &myrow, &mycol );

    printf( "rank = %d, nprow = %d, npcol = %d, myrow = %d, mycol = %d\n",
            rank, nprow, npcol, myrow, mycol );

    MPI_Finalize();
    return 0;
}

Compile and run:

> mpicxx -o blacs blacs.cc -lscalapack -lgfortran

> mpirun -np 4 ./blacs
rank = 0, nprow = 1, npcol = 4, myrow = 0, mycol = 0
rank = 1, nprow = 1, npcol = 4, myrow = 0, mycol = 1
rank = 2, nprow = 1, npcol = 4, myrow = 0, mycol = 2
rank = 3, nprow = 1, npcol = 4, myrow = 0, mycol = 3