In our numerical software I encountered a strange bug after upgrading our cluster. It namely is:
At line 501 of file /home/weser/code/neci/src/fcimc_helper.F90 (unit = 6, file = '/dev/null')
Fortran runtime error: End of record
In this line there is a print *,
statement that prints to stdout.
In our program the STDOUT of all non-root MPI processes is closed and reopened to write to /dev/null
.
(Except in Debug mode, then the STDOUT of every non-root MPI process is redirected to a separate file.)
I tried to create a minimal example for this problem which looks like this:
program stdout_to_dev_null
use iso_fortran_env, only: stdout => output_unit
use mpi_f08 ! also works with plain mpi
implicit none(type, external)
integer :: rank, n_procs, ierror
integer, parameter :: root = 0
call MPI_INIT(ierror)
call MPI_COMM_SIZE(MPI_COMM_WORLD, n_procs, ierror)
call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
if (rank /= root) then
close(stdout, status="keep")
open(stdout, file="/dev/null", recl=8192)
end if
write(stdout, *) 'Size is ', n_procs
write(stdout, *) 'node', rank, ': Hello world'
block
integer :: i
character(:), allocatable :: large_string
allocate(character(len=5000) :: large_string)
do i = 1, len(large_string)
large_string(i : i) = 'A'
end do
write(stdout, *) large_string
end block
call MPI_FINALIZE(ierror)
end program
The problem is that this minimal example works completely as expected, when run manually using mpirun
, but also when actually sent to the cluster like other heavy calculations.
Now I have three questions: Do I have undefined behaviour in such code, when closing and reopening STDOUT and I am simply lucky in the minimal example? How can there be an End of Record in /dev/null
? How can I properly fix this problem?
The problem has nothing to do with MPI and also nothing to do with the difference in cluster.¹ It is problematic code, that fails with
gfortran
but works underifort
by pure luck.If the file is opened with a fixed record length (
recl=...
) a write statement must not exceed this length, even if the output goes to/dev/null
. The fix is simply to not open with a fixed record length and omit therecl=...
argument.Apparently the runtime library of
ifort
is more permissive and even works if the byte length of the written object is larger than the record length specified in theopen
statement.In the following example the last
write
statement fails undergfortran
.¹ The relevant difference between the old and the new cluster for this problem is that we use
gfortran + OpenMPI
on the new one andifort + IntelMPI
on the old one.