Undefined behavior for sending a std::vector<object of user-defined class> in MPI

102 views Asked by At

Following tutorials over registering a new type of MPI, I've registered my simple class of having two members using MPI_Type_create_struct. The problem is that if the size of std::vector grows to 20000 - (this's of course vary system by system depending on available stack size), I'll get :

 Read -1, expected 800000, errno = 14
 Read -1, expected 800000, errno = 14
*** Process received signal ***
Signal: Segmentation fault (11)
Signal code: Address not mapped (1)

For smaller size, the code goes well. Based on changing class definition and adding more members of larger size, I can see why this is happening, but don't know how to resolve it. For example, If I add an array of size of 1000 as the member of this class whether as static array or std::vector, I couldn't allocate std::vector<Dummy> for 2 millions serially. Precisely, std::vector<Dummy> TobeSend (containerSize, Dummy(target)) will throw std::bad_alloc when running only with one processor. I think the issue is having std::vector<Dummy> as a buffer to send, the usual best practise is having std::vector<*Dummy> , but since this's MPI, using pointers won't work. I'd appreciate ideas over this.

#include <mpi.h> 
#include <vector>
#include <stddef.h> 
#include <assert.h>
#include <math.h>
using namespace std; 
typedef int64_t emInt;
class Dummy
{
private:
    emInt N ; 
    emInt Array[2][2]; 
public:
    
    Dummy(const int64_t nDivs) : 
    N(nDivs)
    {

    };
    friend MPI_Datatype register_mpi_type(Dummy const&); 
    ~Dummy() {}; 
};

int main (int argc, char* argv[])
{
    
    MPI_Init(&argc, &argv);     
    int rank , size ;
    int tag=0 ; 
    int flag=0 ;   
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &size);
    MPI_Request request;
    MPI_Status status;
    Dummy object (1); 
    MPI_Datatype dummyType = register_mpi_type(object);

    std::size_t containerSize = 20000;  

    std::vector<int> neighbrs = {0, 1, 2, 3};
    for(auto itri:neighbrs)
    {
        int target = itri; 
        if(rank!= target)
        {
            std::vector<Dummy> TobeSend  (containerSize, Dummy(target)); 
            MPI_Isend(TobeSend.data(),containerSize,dummyType,
          target, tag, MPI_COMM_WORLD, &request ) ; 
            
        }

    }
    for(auto isource:neighbrs)
    {
        int source = isource;
        if(rank!= source)
        {
            std::vector<Dummy> TobeRecvd (containerSize ,Dummy(0));
            MPI_Irecv(TobeRecvd.data(),containerSize, dummyType ,source,tag,
            MPI_COMM_WORLD, &request); 
        }
    }
    MPI_Finalize(); 
    return 0; 
}

MPI_Datatype register_mpi_type(Dummy const&)
{
    Dummy object(1);

    MPI_Datatype builtType;

    constexpr std::size_t numMembers = 2;

    MPI_Datatype types[numMembers]=
    {
        MPI_INT64_T,
        MPI_INT64_T
    }
    ;

    int arrayOfBlockLengths[numMembers];

    arrayOfBlockLengths[0] = 1; 
    arrayOfBlockLengths[1] = 4; 

    MPI_Aint baseadress ; 

    MPI_Aint arrayOfDisplacements [numMembers] 
    = 
    {
        offsetof (Dummy,N),
        offsetof (Dummy,Array),
    };
    MPI_Type_create_struct(numMembers,arrayOfBlockLengths,arrayOfDisplacements,types,&builtType);
    MPI_Type_commit(&builtType);
    return builtType; 
};

CallStack:

 #0  0x00007ffff7b6e94d in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007ffff4d0b244 in ?? () from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so
#2  0x00007ffff4769556 in mca_pml_ob1_send_request_schedule_once () from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so
#3  0x00007ffff4767811 in mca_pml_ob1_recv_frag_callback_ack () from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_pml_ob1.so
#4  0x00007ffff4d0fae5 in mca_btl_vader_poll_handle_frag () from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so
#5  0x00007ffff4d0fdb1 in ?? () from /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi3/mca_btl_vader.so
#6  0x00007ffff7942714 in opal_progress () from /lib/x86_64-linux-gnu/libopen-pal.so.40
#7  0x00007ffff7e9fc0d in ompi_mpi_finalize () from /lib/x86_64-linux-gnu/libmpi.so.40
#8  0x000055555555dd9c in main (argc=1, argv=0x7fffffffd438) at classicDataType.cpp:159
0

There are 0 answers