Dear All I tried to find an answer googling but I haven't been able to find an answer.
I'm using fftw in an MPI Fotran application and i need to compute forward and backward transform of a 3D array of tensor component by component, and while in fourier space compute some complex tensorial quantities. In order to make the array used by ffftw useful and don't spend a lot of time moving data from an array to another one the option that came into my mind was to declare a 5d dimensional array: i.e
use, intrinsic :: iso_c_binding
call MPI_INIT( mpi_err )
call MPI_COMM_RANK( MPI_COMM_WORLD, mpi_rank, mpi_err )
call MPI_COMM_SIZE( MPI_COMM_WORLD, mpi_size, mpi_err )
integer(C_INTPTR_T), parameter :: FFTDIM=3 !fft dimension
integer(C_INTPTR_T) :: fft_L !x direction
integer(C_INTPTR_T) :: fft_M !y direction
integer(C_INTPTR_T) :: fft_N !z direction
complex(C_DOUBLE_COMPLEX), pointer :: fft_in(:,:,:,:,:), fft_out(:,:,:,:,:)
type(C_PTR) :: fft_plan_fwd, fft_plan_bkw, fft_datapointer
integer(C_INTPTR_T) :: fft_alloc_local, fft_local_n0, fft_local_0_start
include 'mpif.h'
include 'fftw3-mpi.f03'
call fftw_mpi_init
fft_L=problem_dim(1)
fft_M=problem_dim(2)
fft_N=problem_dim(3)
! CALCULATE LOCAL SIZE OF FFT VARIABLE FOR EACH COMPOENNT
fft_alloc_local = fftw_mpi_local_size_3d(fft_N,fft_M,fft_L, MPI_COMM_WORLD, &
fft_local_n0, fft_local_0_start)
! allocate data pointer
fft_datapointer = fftw_alloc_complex(9*int(fft_alloc_local,C_SIZE_T))
! link pointers to the same array
call c_f_pointer(fft_datapointer, fft_in, [ FFTDIM, FFTDIM, fft_L, fft_M, fft_local_n0])
call c_f_pointer(fft_datapointer, fft_out, [ FFTDIM, FFTDIM, fft_L, fft_M, fft_local_n0])
! create plans
fft_plan_fwd = fftw_MPI_plan_dft_3d(fft_N, fft_M, fft_L, & !dimension
fft_in(1,1,:,:,:), fft_out(1,1,:,:,:), & !inpu, output
MPI_COMM_WORLD, FFTW_FORWARD, FFTW_MEASURE)
fft_plan_bkw = fftw_MPI_plan_dft_3d(fft_N, fft_M, fft_L, & !dimension
fft_in(1,1,:,:,:), fft_out(1,1,:,:,:), & !inpu, output
MPI_COMM_WORLD, FFTW_BACKWARD, FFTW_MEASURE)
Now, if I use this piece of code and the number of processors is a multiple of 2 (2,4,8...), everything works fine, but if I use 6, the application will give an error. how could I solve this issue? do you have any better strategies instead of allocating a 5d array and without moving to many data??
I found the solution to this problem utilizing the fffw_mpi_plan_many interface the code performing this computation follows here. It calculate a 3D(LxMxN) complex to complex transform of tensor component by component (11,12,...) utilizing MPI capabilities. The extent on the third dimension(N) must be divisible for the number of core utilized
thanks everyone for the help !!