MPI Allreduce error on MPICH 3.1.5 on ARMv7

406 views Asked by At

I have a small cluster of four Raspberry Pi 2 Model Bs. I'm using them to experiment with distributed computing.

I have some code on Github: https://github.com/gordon1992/RPI-Cluster-Scratch

hello_world_c functions fine. global_sum_c (and _f95) error, with the below output:

pi@rpi-cluster-1 ~/RPI-Cluster-Scratch/global_sum_c $ ./run.sh 
Assertion failed in file src/mpi/coll/helper_fns.c at line 491: status->MPI_TAG == recvtag
internal ABORT - process 0

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 26572 RUNNING AT rpi-cluster-1
=   EXIT CODE: 1
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:2@rpi-cluster-3] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:2@rpi-cluster-3] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:2@rpi-cluster-3] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:3@rpi-cluster-4] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:3@rpi-cluster-4] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:3@rpi-cluster-4] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:1@rpi-cluster-2] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:1@rpi-cluster-2] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status
[proxy:0:1@rpi-cluster-2] main (pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec@rpi-cluster-1] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec@rpi-cluster-1] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec@rpi-cluster-1] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:218): launcher returned error waiting for completion
[mpiexec@rpi-cluster-1] main (ui/mpich/mpiexec.c:336): process manager error waiting for completion

I'm using gcc v4.9.2 and MPICH 3.1-5 (installed from Debian testing repositories).

The code runs fine if I only execute it over a single device (say rpi-cluster-1) but not when over multiple devices.

I'm somewhat confused as to what the problem could be, especially since MPI_COMM_WORLD is an intra-communicator and MPI_In_Place is only invalid for inter-communicators.

Any ideas?

Thanks.

0

There are 0 answers