How can Fortran-OpenACC contained subroutine access data from parent subroutine

402 views Asked by At

I am currently accelerating a Fortran code where a contained subroutine (subsub) accesses and modifies variables declared in the parent subroutine (sub):

module mod
  implicit none
contains
  subroutine sub
    integer :: var(10)
    integer :: i

    !$acc kernels loop
    do i = 1, 10
      call subsub
    enddo
  contains
    subroutine subsub
      !$acc routine
      var(i) = i
    endsubroutine
  endsubroutine
endmodule

program test
  use mod
  call sub
endprogram

When compiling with the PGI compiler version 20.9-0, it complains that subsub cannot refer to the host variable var:

sub:
      8, Generating implicit copy(.S0000) [if not already present]
      9, Loop is parallelizable
         Generating Tesla code
          9, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
NVFORTRAN-S-0155-acc routine cannot be used for contained subprograms that refer to host subprogram data: var (test.f90)
  0 inform,   0 warnings,   1 severes, 0 fatal for subsub

Which makes sense. I tried to create var on the device with acc data create(var) or acc declare create(var), but it does not change the outcome.

Can this pattern be accelerated at all?

1

There are 1 answers

4
Mat Colgrove On BEST ANSWER

No, this pattern wont work. For contained routines, the compiler passes a hidden argument to the parent's stack pointer. In this case, the stack pointer would be to the host, which will cause problems when trying to access it from the device.

The work around would be to pass in the variables to the subroutine. For example:

% cat test2.f90
module mod
  implicit none
contains
  subroutine sub
    integer :: var(10)
    integer :: i

    !$acc kernels loop
    do i = 1, 10
      call subsub(var,i)
    enddo
    print *, var
  contains
    subroutine subsub(var,i)
      !$acc routine
    integer :: var(10)
    integer, value :: i
      var(i) = i
    endsubroutine
  endsubroutine
endmodule

program test
  use mod
  call sub
endprogram
% nvfortran test2.f90 -acc -Minfo=accel ; a.out
sub:
      8, Generating implicit copy(.S0000,var(:)) [if not already present]
      9, Loop is parallelizable
         Generating Tesla code
          9, !$acc loop gang, vector(32) ! blockidx%x threadidx%x
subsub:
     14, Generating acc routine seq
         Generating Tesla code
            1            2            3            4            5            6
            7            8            9           10