Fortran Seg Fault when assigning Matrices

164 views Asked by At

[UPDATE] the code and a few sentences were changed to reflect a realization explained in my second comment. The code should compile with the line below, however, I have an older gfortran and may not be seeing some errors that you might.

gfortran BLU_implementation_copy.f90 -o BLU_implementation_copy.x

I'm getting an incredibly inconsistent seg fault error at runtime with Fortran 90.

The overall objective of my code is to factor a randomly generated, complex, symmetric, diagonally-dominant matrix in blocks that can be easily parallelized. A stripped down version (except for the relevant functions) can be found at the end.

I'm running into the error when breaking up the original matrix into blocks (tiles) in the function mat2tiles. Specifically this line of code keeps failing:

o(i,j)=cell(x(a:min(i*Nblock,M), b:min(j*Nblock, N)))

This is assigning a block of the input matrix X to matrix o at index i,j. It accomplishes this because each index of o is a matrix NblockxNblock through the derived type:

type cell complex*16 :: block(Nblock,Nblock) end type cell

type(cell) :: o(M/Nblock+rem,N/Nblock+rem)

For certain matrix sizes and tile sizes, everything works perfectly. For larger sizes, the error starts to appear in tile sizes close to the matrix size and eventually reach those at half the size of the matrix. For example, the smallest instance of this happening I have found is a 20x20 matrix with a tile size of 18. But this doesn't happen consistently. If I try out smaller matrices that all run normally and build up to that size, it runs. If I do a larger size that seg faults and then run the 20x20 with 18, it seg faults. The smallest consistent parameters I've found have been 25x25 with a tile size of 23. Yet even with this size, if I print out the block where it fails after the o(i,j) assignment line:

print*, o(2,1)%block

It suddenly runs through the whole thing without a problem. Taking out the print statement makes it seg fault again. Finally, the most annoying one [fixed - refer to second comment by me], if I do a 2000x2000 matrix with a tile size of 1000 or more, I get a seg fault just after getting into the function (which means it's occurring during the variable allocations). This leads me to believe the issue may stem from how the matrix is allocated, especially since I'm using a derived type. But when I try to diagnose the size and contents of the matrix, everything seems normal.

program BLU_implementation
    implicit none
    integer :: rem
    integer, parameter :: M=20, N=20, Nblock=19
    real*8 :: start, finish
    complex*16 :: A(M,N)

    type cell
    complex*16 :: block(Nblock,Nblock)
    end type cell

!determines if cell matrix doesn't need to be bigger
    rem=1
    if (modulo(M,Nblock)==0) then
        rem=0
    endif

    call cpu_time(start)
    call functionCalling(A, M, N, Nblock, rem)
    call cpu_time(finish)
    print*, 'overall time:', finish-start, 'seconds'

contains
!==================================================================================================================
subroutine functionCalling(A, M, N, Nblock, rem)
    implicit none
    integer :: IPIV, INFO, M, N, Nblock, rem
    real*8 :: start, finish
    complex*16 :: A(M,N)

    type(cell) :: C(M/Nblock+rem,N/Nblock+rem)

    call cpu_time(start)

    A= CSPDmatrixFill(A,M,N)

    call cpu_time(finish)
    print*, 'matrix fill time:', finish-start, 'seconds'


    call cpu_time(start)

    C= mat2tiles(A,M,N,Nblock,rem) 

    call cpu_time(finish)
    print*, 'tiling time:', finish-start, 'seconds'

end subroutine

!===================================================================================================================
! generates a complex, symmetric, positive-definite matrix (based off of another's code)

function CSPDmatrixFill(A, M, N) result (Matrix)

! initialization  
      implicit none
      integer :: i, j
      integer :: M, N
      real*8 :: x, xi
      complex*16 :: A(M, N), Matrix(M, N), EYE(M, N), MT(N, M)

      EYE=0
      forall(j=1:M) EYE(j,j)=1

! execution

      call random_seed

      do i=1, M
        do j=1, N
          call random_number (x)
            call random_number(xi)
            Matrix(i,j) = cmplx(x,xi)
       end do
      end do  

! construct a symmetric matrix ( O(n^2) )
      call Mtranspose(Matrix, M, N)
      Matrix = Matrix+MT  

! make positive definite (diagonally dominant)
        Matrix = Matrix + N*EYE

end function CSPDmatrixFill

!======================================================================================================
subroutine Mtranspose(A, i, j)
! takes a matrix and the two parameters used to make the matrix: A(i,j)
! returns a matrix with switched indices: A(j,i)
      implicit none

      integer :: i, j
      complex*16 :: A(i,j), MT(j,i)

      MT=A(j,i)
      return
end subroutine Mtranspose

!=======================================================================================================
!MAT2TILES - breaks up an array into a cell array of adjacent sub-arrays of equal sizes
!
! O=mat2tiles(X,M,N,Nblock)
!
!will produce a cell array o containing adjacent chunks of the array X(M,N)
!with each chunk of dimensions NblockxNblock. If Nblock does
!not divide evenly into size(X,i), then the chunks at the upper boundary of
!X along dimension i will have bogus values that do not affect the factorization
!in the places where the matrix doesn't occupy. (according to older versions. Might have changed with some edits)
!
function mat2tiles(X,M,N,Nblock,rem) result(o)

! initialization  
      implicit none
      integer :: a, b, i, j, M, N, Nblock, rem
      complex*16 :: X(M,N)

      type(cell) :: o(M/Nblock+rem,N/Nblock+rem)

! diagnostic print statements
        print*,size(o(1,1)%block), size(o(1,2)%block), size(o(2,1)%block), size(o(2,2)%block)
        print*, 'got to start'

! turn matrix x into cell matrix o

      do j=1, N/Nblock+rem
        if (j==1) then
          b=j
        else 
          b=b+Nblock
        endif       
        do i=1, M/Nblock+rem
          if (i==1) then
            a=i
          else 
            a=a+Nblock
          endif
! diagnostic print statement
          print*, 'writing to o: i:', i, 'j:', j, 'i*Nblock:', i*Nblock, 'j*Nblock:', j*Nblock, 'min of i:', min(i*Nblock, M), &
                    'min of j:', min(j*Nblock, N), 'a:', a, 'b:', b

          o(i,j)=cell(x(a:min(i*Nblock,M), b:min(j*Nblock, N))) 
        enddo
      enddo
! diagnostic print statement
      print*, 'got to end'
      return 
end function mat2tiles

!==================================================================================================
end program
1

There are 1 answers

0
achilles0613 On BEST ANSWER

After narrowing down the issue to using numbers vs. variables of those same numbers, it was discovered that Fortran doesn't like assigning a matrix to a matrix of different dimensions even if it fit inside. This was weird because smaller values of M, N, and Nblock got around the issue perfectly.

The solution was simply defining o(i,j)%block(1:Nblock,1:Nblock)=x(dim1:dim2,etc) instead of o(i,j)=cell(x(dim1:dim2,etc) for i=1 and j=1 and having a slightly modified instance of each for all cases of i and j.

Correct code (for all cases of matrix size and tile size) looks like:

if (i==M/Nblock+rem .AND. j==N/Nblock+rem .AND. rem==1) then 
    o(i,j)%block(1:M-(i-1)*Nblock,1:N-(j-1)*Nblock)=x(a:min(i*Nblock,M), b:min(j*Nblock, N))
else if (i==M/Nblock+rem .AND. j/=N/Nblock+rem .AND. rem==1) then
    o(i,j)%block(1:M-(i-1)*Nblock,1:Nblock)=x(a:min(i*Nblock,M), b:min(j*Nblock, N))
else if (i/=M/Nblock+rem .AND. j==N/Nblock+rem .AND. rem==1) then
    o(i,j)%block(1:Nblock,1:N-(j-1)*Nblock)=x(a:min(i*Nblock,M), b:min(j*Nblock, N))
else
    o(i,j)%block(1:Nblock,1:Nblock)=x(a:min(i*Nblock,M), b:min(j*Nblock, N))
end if

With this correction, my code runs on the old and new versions of gfortran. Interesting to note, however, is how the Mac OS X version of gfortran never ran into this segfault, only the Linux version.