I am currently accelerating a Fortran code where a contained subroutine (subsub
) accesses and modifies variables declared in the parent subroutine (sub
):
module mod
implicit none
contains
subroutine sub
integer :: var(10)
integer :: i
!$acc kernels loop
do i = 1, 10
call subsub
enddo
contains
subroutine subsub
!$acc routine
var(i) = i
endsubroutine
endsubroutine
endmodule
program test
use mod
call sub
endprogram
When compiling with the PGI compiler version 20.9-0, it complains that subsub
cannot refer to the host variable var
:
sub:
8, Generating implicit copy(.S0000) [if not already present]
9, Loop is parallelizable
Generating Tesla code
9, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
NVFORTRAN-S-0155-acc routine cannot be used for contained subprograms that refer to host subprogram data: var (test.f90)
0 inform, 0 warnings, 1 severes, 0 fatal for subsub
Which makes sense.
I tried to create var
on the device with acc data create(var)
or acc declare create(var)
, but it does not change the outcome.
Can this pattern be accelerated at all?
No, this pattern wont work. For contained routines, the compiler passes a hidden argument to the parent's stack pointer. In this case, the stack pointer would be to the host, which will cause problems when trying to access it from the device.
The work around would be to pass in the variables to the subroutine. For example: