Fortran unformatted output with each MPI process writing part of an array

1.8k views Asked by At

In my parallel program, there was a big matrix. Each process computed and stored a part of it. Then the program wrote the matrix to a file by letting each process wrote its own part of the matrix in the correct order. The output file is in "unformatted" form. But when I tried to read the file in a serial code (I have the correct size of the big matrix allocated), I got an error which I don't understand.

My question is: in an MPI program, how do you get a binary file as the serial version output for a big matrix which is stored by different processes?

Here is my attempt:

    if(ThisProcs == RootProcs) then
        open(unit = file_restart%unit, file = file_restart%file, form = 'unformatted')
        write(file_restart%unit)psi
        close(file_restart%unit)
    endif
#ifdef USEMPI
    call mpi_barrier(mpi_comm_world,MPIerr)
#endif
    do i = 1, NProcs - 1
        if(ThisProcs == i) then
            open(unit = file_restart%unit, file = file_restart%file, form = 'unformatted', status = 'old', position = 'append')
            write(file_restart%unit)psi
            close(file_restart%unit)
        endif
#ifdef USEMPI
        call mpi_barrier(mpi_comm_world,MPIerr)
#endif
    enddo

Psi is the big matrix, it is allocated as:

Psi(N_lattice_points, NPsiStart:NPsiEnd)

But when I tried to load the file in a serial code:

open(2,file=File1,form="unformatted")
read(2)psi

forrtl: severe (67): input statement requires too much data, unit 2 (I am using MSVS 2012+intel fortran 2013)

How can I fix the parallel part to make the binary file readable for the serial code? Of course one can combine them into one big matrix in the MPI program, but is there an easier way?

Edit 1

The two answers are really nice. I'll use access = "stream" to solve my problem. And I just figured I can use inquire to check whether the file is "sequential" or "stream".

3

There are 3 answers

1
francescalus On BEST ANSWER

This isn't a problem specific to MPI, but would also happen in a serial program which took the same approach of writing out chunks piecemeal.

Ignore the opening and closing for each process and look at the overall connection and transfer statements. Your connection is an unformatted file using sequential access. It's unformatted because you explicitly asked for that, and sequential because you didn't ask for anything else.

Sequential file access is based on records. Each of your write statements transfers out a record consisting of a chunk of the matrix. Conversely, your input statement attempts to read from a single record.

Your problem is that while you try to read the entire matrix from the first record of the file that record doesn't contain the whole matrix. It doesn't contain anything like the correct amount of data. End result: "input statement requires too much data".

So, you need to either read in the data based on the same record structure, or move away from record files.

The latter is simple, use stream access

open(unit = file_restart%unit, file = file_restart%file,  &
     form = 'unformatted', access='stream')

Alternatively, read with a similar loop structure:

do i=1, NPROCS
  ! read statement with a slice
end do

This of course requires understanding the correct slicing.

Alternatively, one can consider using MPI-IO for output, which is very similar to using stream output. Read this back in with stream access. You can find about this concept elsewhere on SO.

0
casey On

Fortran unformatted sequential writes in record files are not quite completely raw data. Each write will have data before and after the record in a processor dependent form. The size of your reads cannot exceed the record size of your writes. This means if psi is written in two writes, you will need to read it back in two reads, you cannot read it in at once.

Perhaps the most straightforward option is to instead use stream access instead of sequential. A stream file is indexed by bytes (generally) and does not contain record start and end information. Using this access method you can split the write but read all at once. Stream access is a feature of Fortran 2003.

If you stick with sequential access, you'll need to know how many MPI ranks wrote the file and loop over properly sized records to read the data as it was written. You could make the user specify the number of ranks or store that as the first record in the file and read that first to determine how to read the rest of the data.

1
Rob Latham On

If you are writing MPI, why not MPI-IO? Each process will call MPI_File_set_view to set a subarray view of the file, then each process can collectively write the data with MPI_FILE_WRITE_ALL . This approach is likely to scale really well on big machines (though your approach will be fine up to oh, maybe 100 processors.)