Parallel HDF5: "make check" hangs when running t_mpi

1.1k views Asked by At

I've being struggling to get the parallel HDF5 to work on the cluster for a whole week but without any progress. I wish if anyone could help me with this. Thanks!

I'm building the Parallel HDF5 (hdf5-1.8.15-patch1) on a lustre file system with RedHat Enterprise Linux 5.5 x86_64 OS. I tried to compile it with both impi 4.0.2 and openmpi 1.8 and it succeeded without any errors. When I "make check", both of them passed the serial tests, but hung immediately after entering the parallel tests (t_mpi, in particular). Eventually, I had to ctrl+C to end it. Here is the output:

lijm@c01b03:~/yuan/hdf5-1.8.15-patch1/testpar$ make check
  CC       t_mpi.o
t_mpi.c: In function ‘test_mpio_gb_file’:
t_mpi.c:284: warning: passing argument 1 of ‘malloc’ with different width due to prototype
t_mpi.c:284: warning: request for implicit conversion from ‘void *’ to ‘char *’ not permitted in C++
t_mpi.c: In function ‘test_mpio_1wMr’:
t_mpi.c:465: warning: passing argument 2 of ‘gethostname’ with different width due to prototype
t_mpi.c: In function ‘test_mpio_derived_dtype’:
t_mpi.c:682: warning: declaration of ‘nerrors’ shadows a global declaration
t_mpi.c:37: warning: shadowed declaration is here
t_mpi.c:771: warning: passing argument 5 of ‘MPI_File_set_view’ discards qualifiers from pointer target type
t_mpi.c:798: warning: passing argument 2 of ‘MPI_File_set_view’ with different width due to prototype
t_mpi.c:798: warning: passing argument 5 of ‘MPI_File_set_view’ discards qualifiers from pointer target type
t_mpi.c:685: warning: unused variable ‘etypenew’
t_mpi.c:682: warning: unused variable ‘nerrors’
t_mpi.c: In function ‘main’:
t_mpi.c:1104: warning: too many arguments for format
t_mpi.c: In function ‘test_mpio_special_collective’:
t_mpi.c:991: warning: will never be executed
t_mpi.c:992: warning: will never be executed
t_mpi.c:995: warning: will never be executed
t_mpi.c: In function ‘test_mpio_gb_file’:
t_mpi.c:229: warning: will never be executed
t_mpi.c:232: warning: will never be executed
t_mpi.c:237: warning: will never be executed
t_mpi.c:238: warning: will never be executed
t_mpi.c:253: warning: will never be executed
t_mpi.c:258: warning: will never be executed
t_mpi.c:259: warning: will never be executed
t_mpi.c:281: warning: will never be executed
t_mpi.c:246: warning: will never be executed
t_mpi.c:267: warning: will never be executed
t_mpi.c:319: warning: will never be executed
t_mpi.c:343: warning: will never be executed
t_mpi.c:385: warning: will never be executed
t_mpi.c:389: warning: will never be executed
t_mpi.c:248: warning: will never be executed
t_mpi.c:269: warning: will never be executed
t_mpi.c: In function ‘main’:
t_mpi.c:1143: warning: will never be executed
t_mpi.c:88: warning: will never be executed
t_mpi.c:102: warning: will never be executed
t_mpi.c:133: warning: will never be executed
t_mpi.c:142: warning: will never be executed
  CCLD     t_mpi
make  t_mpi testphdf5 t_cache t_pflush1 t_pflush2 t_pshutdown t_prestart t_shapesame
make[1]: Entering directory `/home/lijm/yuan/hdf5-1.8.15-patch1/testpar'
make[1]: `t_mpi' is up to date.
make[1]: `testphdf5' is up to date.
make[1]: `t_cache' is up to date.
make[1]: `t_pflush1' is up to date.
make[1]: `t_pflush2' is up to date.
make[1]: `t_pshutdown' is up to date.
make[1]: `t_prestart' is up to date.
make[1]: `t_shapesame' is up to date.
make[1]: Leaving directory `/home/lijm/yuan/hdf5-1.8.15-patch1/testpar'
make  check-TESTS
make[1]: Entering directory `/home/lijm/yuan/hdf5-1.8.15-patch1/testpar'
make[2]: Entering directory `/home/lijm/yuan/hdf5-1.8.15-patch1/testpar'
make[3]: Entering directory `/home/lijm/yuan/hdf5-1.8.15-patch1/testpar'
make[3]: Nothing to be done for `_exec_check-s'.
make[3]: Leaving directory `/home/lijm/yuan/hdf5-1.8.15-patch1/testpar'
make[2]: Leaving directory `/home/lijm/yuan/hdf5-1.8.15-patch1/testpar'
make[2]: Entering directory `/home/lijm/yuan/hdf5-1.8.15-patch1/testpar'
===Parallel tests in testpar begin Thu Jun 11 22:07:48 CST 2015===
**** Hint ****
Parallel test files reside in the current directory by default.
Set HDF5_PARAPREFIX to use another directory. E.g.,
HDF5_PARAPREFIX=/PFS/user/me
export HDF5_PARAPREFIX
make check
**** end of Hint ****
make[3]: Entering directory `/home/lijm/yuan/hdf5-1.8.15-patch1/testpar'
============================
Testing  t_mpi
============================
 t_mpi  Test Log
============================
===================================
MPI functionality tests
===================================
Proc 1: hostname=c01b03
Proc 2: hostname=c01b03
Proc 3: hostname=c01b03
Proc 5: hostname=c01b03
--------------------------------
Proc 0: *** MPIO 1 write Many read test...
--------------------------------
Proc 0: hostname=c01b03
Proc 4: hostname=c01b03
Command exited with non-zero status 255
0.08user 0.01system 0:37.65elapsed 0%CPU (0avgtext+0avgdata    0maxresident)k
0inputs+0outputs (0major+5987minor)pagefaults 0swaps
make[3]: *** [t_mpi.chkexe_] Error 1
make[2]: *** [build-check-p] Interrupt
make[1]: *** [test] Interrupt
make: *** [check-am] Interrupt

The above outputs of two MPI implementations are the same, but openmpi also outputs the warning:

WARNING: It appears that your OpenFabrics subsystem is configured to only allow registering part of your physical memory. This can cause MPI jobs to run with erratic performance, hang, and/or crash.

I've searched for this problem. But I don't think it could be the cause for the hanging, the reason is stated at the end.

I've tried to locate the place where it hangs. What I found is that it always gets stuck on the first collective function it meets. For example, in t_mpi. it firstly hangs at:

MPI_File_delete(filename, MPI_INFO_NULL); (line 477),

in test_mpio_1wMr. And if I comment out this line, it gets stuck at MPI_File_open just below. But I'm not sure about what happened inside these functions.

There is another thing I noticed. The folder of HDF5 where I do the "make" is in a NFS file system, and I can only access the lustre through a particular folder located somewhere else. So, I found that the test runs pretty well if I don't set the HDF5_PARAPERFIX to my lustre folder, since the test is performed locally by default. So, I suppose it should be a issue related with the lustre itself, not the limit of memory?

Thank you!

1

There are 1 answers

1
Rob Latham On

It's hard to say what's going on here.

It may be that you are applying "generic unix file system" to your lustre driver. Intel MPI requires two environment variables (I_MPI_EXTRA_FILESYSTEM and I_MPI_EXTRA_FILESYSTEM_LIST) to use lustre-optimized code paths: (see https://press3.mcs.anl.gov/romio/2014/06/12/romio-and-intel-mpi/ for more details).

You'll have to explicitly request lustre support when you build OpenMPI, too.

It would help a lot if you can attach a debugger to one or more of the stuck processes to see where its hanging. stuck on an i/o routine? stuck in communication?