End of record when writing to /dev/null

169 views Asked by At

In our numerical software I encountered a strange bug after upgrading our cluster. It namely is:

At line 501 of file /home/weser/code/neci/src/fcimc_helper.F90 (unit = 6, file = '/dev/null')
Fortran runtime error: End of record

In this line there is a print *, statement that prints to stdout.

In our program the STDOUT of all non-root MPI processes is closed and reopened to write to /dev/null. (Except in Debug mode, then the STDOUT of every non-root MPI process is redirected to a separate file.)

I tried to create a minimal example for this problem which looks like this:

  program stdout_to_dev_null
      use iso_fortran_env, only: stdout => output_unit
      use mpi_f08  ! also works with plain mpi
      implicit none(type, external)
  
      integer :: rank, n_procs, ierror
      integer, parameter :: root = 0 
  
      call MPI_INIT(ierror)
      call MPI_COMM_SIZE(MPI_COMM_WORLD, n_procs, ierror)
      call MPI_COMM_RANK(MPI_COMM_WORLD, rank, ierror)
  
      if (rank /= root) then
          close(stdout, status="keep")
          open(stdout, file="/dev/null", recl=8192)
      end if
  
      write(stdout, *) 'Size is ', n_procs
      write(stdout, *) 'node', rank, ': Hello world'
  
      block
          integer :: i
          character(:), allocatable :: large_string
          allocate(character(len=5000) :: large_string)
  
          do i = 1, len(large_string)
              large_string(i : i) = 'A' 
          end do
  
          write(stdout, *) large_string
      end block
  
  
      call MPI_FINALIZE(ierror)
  
  end program

The problem is that this minimal example works completely as expected, when run manually using mpirun, but also when actually sent to the cluster like other heavy calculations.

Now I have three questions: Do I have undefined behaviour in such code, when closing and reopening STDOUT and I am simply lucky in the minimal example? How can there be an End of Record in /dev/null? How can I properly fix this problem?

1

There are 1 answers

0
mcocdawc On BEST ANSWER

The problem has nothing to do with MPI and also nothing to do with the difference in cluster.¹ It is problematic code, that fails with gfortran but works under ifort by pure luck.

If the file is opened with a fixed record length (recl=...) a write statement must not exceed this length, even if the output goes to /dev/null. The fix is simply to not open with a fixed record length and omit the recl=... argument.

Apparently the runtime library of ifort is more permissive and even works if the byte length of the written object is larger than the record length specified in the open statement.

In the following example the last write statement fails under gfortran.

program stdout_to_dev_null
    use iso_fortran_env, only: stdout => output_unit
    implicit none(type, external)

    integer, parameter :: rec_length = 10

    write(stdout, *) 'asdf'

    close(stdout, status="keep")

    open(stdout, file="/dev/null")
    block
        integer :: i
        character(:), allocatable :: large_string

        allocate(character(len=rec_length - 1) :: large_string)

        do i = 1, len(large_string)
            large_string(i : i) = 'A'
        end do

        write(stdout, *) large_string

        deallocate(large_string)
        allocate(character(len=rec_length + 1) :: large_string)

        do i = 1, len(large_string)
            large_string(i : i) = 'A'
        end do

        write(stdout, *) large_string
    end block
    close(stdout, status="keep")

    open(stdout, file="/dev/null", recl=rec_length)
    block
        integer :: i
        character(:), allocatable :: large_string

        allocate(character(len=rec_length - 1) :: large_string)

        do i = 1, len(large_string)
            large_string(i : i) = 'A'
        end do

        write(stdout, *) large_string

        deallocate(large_string)
        allocate(character(len=rec_length + 1) :: large_string)

        do i = 1, len(large_string)
            large_string(i : i) = 'A'
        end do

        ! The following statement fails
        write(stdout, *) large_string
    end block
    close(stdout, status="keep")

end program

¹ The relevant difference between the old and the new cluster for this problem is that we use gfortran + OpenMPI on the new one and ifort + IntelMPI on the old one.