I am trying to offload an expensive loop in my program to Intel MIC. The part of the code is:
!$omp target map(to:coor,sigma_const,clase) map(tofrom:ener1,ener2)
!$omp parallel private(i,j,fdummy1,k,l,fdummy2,fdummy3,fdummy4,fdummy5,dist)
!$omp do reduction(+:ener1)
do i=1,num_res-2
do j=i+2,num_res
fdummy1=coor(i,1,qk)-coor(j,1,qk)
fdummy2=coor(i,2,qk)-coor(j,2,qk)
fdummy3=coor(i,3,qk)-coor(j,3,qk)
dist=sqrt(fdummy1*fdummy1+fdummy2*fdummy2+fdummy3*fdummy3)
fdummy1=sigma_const(i,j)
write(6,*) 'fdum',fdummy1
k=clase(i)
l=clase(j)
fdummy2=fdummy1*fdummy1 ! 2
fdummy3=fdummy2*fdummy2 ! 4
fdummy4=fdummy2*fdummy3 ! 6
fdummy5=fdummy4*fdummy4 ! 12
fdummy1=fdummy5-fdummy4
ener1=ener1+eps_const(k,l)*fdummy1
enddo
enddo
!$omp end do
!$omp do reduction(+:ener2)
do i=1,num_res-1
fdummy1=coor(i,1,qk)-coor(i+1,1,qk)
fdummy2=coor(i,2,qk)-coor(i+1,2,qk)
fdummy3=coor(i,3,qk)-coor(i+1,3,qk)
dist=sqrt(fdummy1*fdummy1+fdummy2*fdummy2+fdummy3*fdummy3)
fdummy1=(dist-r_cero)
fdummy2=fdummy1*fdummy1
ener2=ener2+fdummy2
enddo
!$omp end do
!$omp end parallel
!$omp end target
When I printout the value of sigma_const array I get 0, I don't know the
reason because I map the array to the MIC. I am
confused about the order of !$omp target map/!$omp target
directives with
respect to the !$omp parallel
one. In some examples, I saw on internet,
people use first target directives outside parallel regions but they
can also use target inside the parallel region (see for instance: https://software.intel.com/en-us/forums/intel-visual-fortran-compiler-for-windows/topic/516606).
Another thing, is it possible that the MIC keeps the values of some arrays like sigma_const,clase (which don't change during the simulation) so that I don't need to transfer them at each simulation time step?