MVAPICH on multi-GPU causes Segmentation fault

843 views Asked by At

I'm using MVAPICH2 2.1 on a Debian 7 machine. It has multiple cards of Tesla K40m. The code is as follows.

#include <cstdio>
#include <cstdlib>
#include <ctime>
#include <cuda_runtime.h>
#include <mpi.h>

int main(int argc, char** argv) {
    MPI_Status status;
    int rank;
    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    cudaSetDevice(0);
    if (rank == 0) {
        srand(time(0));
        float* a;
        float num = rand();
        cudaMalloc(&a, sizeof(float));
        cudaMemcpy(a, &num, sizeof(float), cudaMemcpyDefault);
        MPI_Send(a, sizeof(float), MPI_CHAR, 1, 0, MPI_COMM_WORLD);
        printf("sent %f\n", num);
    } else {
        float* a;
        float num;
        cudaMalloc(&a, sizeof(float));
        MPI_Recv(a, sizeof(float), MPI_CHAR, 0, 0, MPI_COMM_WORLD, &status);
        cudaMemcpy(&num, a, sizeof(float), cudaMemcpyDefault);
        printf("received %f\n", num);
    }
    cudaSetDevice(1);
    if (rank == 0) {
        float* a;
        float num = rand();
        cudaMalloc(&a, sizeof(float));
        cudaMemcpy(a, &num, sizeof(float), cudaMemcpyDefault);
        MPI_Send(a, sizeof(float), MPI_CHAR, 1, 0, MPI_COMM_WORLD);
        printf("sent %f\n", num);
    } else {
        float* a;
        float num;
        cudaMalloc(&a, sizeof(float));
        MPI_Recv(a, sizeof(float), MPI_CHAR, 0, 0, MPI_COMM_WORLD, &status);
        cudaMemcpy(&num, a, sizeof(float), cudaMemcpyDefault);
        printf("received %f\n", num);
    }
    MPI_Finalize();
    return 0;
}

In short, I first set device to GPU 0, send something. Then I set device to GPU 1, send something.

The output is as follows.

sent 1778786688.000000
received 1778786688.000000
[debian:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
[debian:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 7. MPI process died?
[debian:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI process died?
[debian:mpispawn_0][child_handler] MPI process (rank: 0, pid: 30275) terminated with signal 11 -> abort job
[debian:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node debian aborted: Error while reading a PMI socket (4)

So the first send is OK. But as soon as I set my device to the other GPU, and then MPI send, boom! I wonder why this is happening.

Also, I built MVAPICH with the following command.

./configure --enable-cuda --with-cuda=/usr/local/cuda --with-device=ch3:mrail --enable-rdma-cm

I have debugging enabled and stack trace printed. Hopefully this helps..

sent 1377447040.000000
received 1377447040.000000
[debian:mpi_rank_0][error_sighandler] Caught error: Segmentation fault (signal 11)
[debian:mpi_rank_0][print_backtrace]   0: /home/lyt/local/lib/libmpi.so.12(print_backtrace+0x1c) [0x7fba26a00b3c]
[debian:mpi_rank_0][print_backtrace]   1: /home/lyt/local/lib/libmpi.so.12(error_sighandler+0x59) [0x7fba26a00c39]
[debian:mpi_rank_0][print_backtrace]   2: /lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0) [0x7fba23ffe8d0]
[debian:mpi_rank_0][print_backtrace]   3: /usr/lib/libcuda.so.1(+0x21bb30) [0x7fba26fa9b30]
[debian:mpi_rank_0][print_backtrace]   4: /usr/lib/libcuda.so.1(+0x1f6695) [0x7fba26f84695]
[debian:mpi_rank_0][print_backtrace]   5: /usr/lib/libcuda.so.1(+0x205586) [0x7fba26f93586]
[debian:mpi_rank_0][print_backtrace]   6: /usr/lib/libcuda.so.1(+0x17ad88) [0x7fba26f08d88]
[debian:mpi_rank_0][print_backtrace]   7: /usr/lib/libcuda.so.1(cuStreamWaitEvent+0x63) [0x7fba26ed72e3]
[debian:mpi_rank_0][print_backtrace]   8: /usr/local/cuda/lib64/libcudart.so.6.5(+0xa023) [0x7fba27cff023]
[debian:mpi_rank_0][print_backtrace]   9: /usr/local/cuda/lib64/libcudart.so.6.5(cudaStreamWaitEvent+0x1ce) [0x7fba27d2cf3e]
[debian:mpi_rank_0][print_backtrace]  10: /home/lyt/local/lib/libmpi.so.12(MPIDI_CH3_CUDAIPC_Rendezvous_push+0x17f) [0x7fba269f25bf]
[debian:mpi_rank_0][print_backtrace]  11: /home/lyt/local/lib/libmpi.so.12(MPIDI_CH3_Rendezvous_push+0xe3) [0x7fba269a0233]
[debian:mpi_rank_0][print_backtrace]  12: /home/lyt/local/lib/libmpi.so.12(MPIDI_CH3I_MRAILI_Process_rndv+0xa4) [0x7fba269a0334]
[debian:mpi_rank_0][print_backtrace]  13: /home/lyt/local/lib/libmpi.so.12(MPIDI_CH3I_Progress+0x19a) [0x7fba2699aeaa]
[debian:mpi_rank_0][print_backtrace]  14: /home/lyt/local/lib/libmpi.so.12(MPI_Send+0x6ef) [0x7fba268d118f]
[debian:mpi_rank_0][print_backtrace]  15: ./bin/minimal.run() [0x400c15]
[debian:mpi_rank_0][print_backtrace]  16: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fba23c67b45]
[debian:mpi_rank_0][print_backtrace]  17: ./bin/minimal.run() [0x400c5c]
[debian:mpispawn_0][readline] Unexpected End-Of-File on file descriptor 6. MPI process died?
[debian:mpispawn_0][mtpmi_processops] Error while reading PMI socket. MPI process died?
[debian:mpispawn_0][child_handler] MPI process (rank: 0, pid: 355) terminated with signal 11 -> abort job
[debian:mpirun_rsh][process_mpispawn_connection] mpispawn_0 from node debian8 aborted: Error while reading a PMI socket (4)
1

There are 1 answers

1
Narcolessico On BEST ANSWER

I'm afraid MVAPICH does not support yet using multiple GPUs in the same process (source: mailing list).

Advanced memory transfer operations require storing device-specific structures, so unless there is explicit support for multiple devices, I'm afraid there is no way to make your code run.

On the other side, you can of course use multiple GPU devices by running a separate process per device.