OpenACC routine vector with intent out argument

139 views Asked by At

I am currently accelerating a Fortran code where I have a main accelerated loop in subroutine sub. In the loop, I want to call subroutine subsub on the device with acc routine. The subroutine has an intent(out) argument val, which is private in the loop. As subsub has a loop itself, I want to use the vector clause:

module calc
  implicit none
  public :: sub
  private
contains
  subroutine sub()
    integer :: i
    integer :: array(10)
    integer :: val
    !$acc kernels loop independent private(val)
    do i = 1, 10
      call subsub(val)
      array(i) = val
    enddo
    print "(10(i0, x))", array
  endsubroutine
  subroutine subsub(val)
    !$acc routine vector
    integer, intent(out) :: val
    integer :: i
    val = 0
    !$acc loop independent reduction(+:val)
    do i = 1, 10
      val = val + 1
    enddo
  endsubroutine
endmodule

program test                                               
  use calc, only: sub                                
  implicit none                                      
  call sub()                                         
endprogram                                                

When compiling with the PGI compiler version 20.9-0 and running the program, I get gibberish values in variable array. When I simply use acc routine for subsub, I get the correct behavior (10 in all values of array). What is wrong in my approach to parallelize this subroutine?

1

There are 1 answers

0
Mat Colgrove On

It does look like a compiler code generation issue on how val is getting handled in the main loop. Luckily the workaround is easy, just add the installation of val in the main loop.

% cat test.f90
module calc
  implicit none
  public :: sub
  private
contains
  subroutine sub()
    integer :: i
    integer :: array(10)
    integer :: val
    !$acc kernels loop independent private(val)
    do i = 1, 10
      val = 0
      call subsub(val)
      array(i) = val
    enddo
    print "(10(i0, x))", array
  endsubroutine
  subroutine subsub(val)
    !$acc routine vector
    integer, intent(out) :: val
    integer :: i
    val = 0
    !$acc loop independent reduction(+:val)
    do i = 1, 10
      val = val + 1
    enddo
  endsubroutine
endmodule

program test
  use calc, only: sub
  implicit none
  call sub()
endprogram
% nvfortran -acc -Minfo=accel test.f90 -V20.9 ; a.out
sub:
     10, Generating implicit copyout(array(:)) [if not already present]
     11, Loop is parallelizable
         Generating Tesla code
         11, !$acc loop gang ! blockidx%x
subsub:
     18, Generating Tesla code
         24, !$acc loop vector ! threadidx%x
             Generating reduction(+:val)
             Vector barrier inserted for vector loop reduction
     24, Loop is parallelizable
10 10 10 10 10 10 10 10 10 10