I am currently accelerating a Fortran code where I have a main accelerated loop in subroutine sub
. In the loop, I want to call subroutine subsub
on the device with acc routine
. The subroutine has an intent(out)
argument val
, which is private in the loop. As subsub
has a loop itself, I want to use the vector
clause:
module calc
implicit none
public :: sub
private
contains
subroutine sub()
integer :: i
integer :: array(10)
integer :: val
!$acc kernels loop independent private(val)
do i = 1, 10
call subsub(val)
array(i) = val
enddo
print "(10(i0, x))", array
endsubroutine
subroutine subsub(val)
!$acc routine vector
integer, intent(out) :: val
integer :: i
val = 0
!$acc loop independent reduction(+:val)
do i = 1, 10
val = val + 1
enddo
endsubroutine
endmodule
program test
use calc, only: sub
implicit none
call sub()
endprogram
When compiling with the PGI compiler version 20.9-0 and running the program, I get gibberish values in variable array
. When I simply use acc routine
for subsub
, I get the correct behavior (10 in all values of array
). What is wrong in my approach to parallelize this subroutine?
It does look like a compiler code generation issue on how val is getting handled in the main loop. Luckily the workaround is easy, just add the installation of val in the main loop.