I'm trying to compute batch 1D FFTs using cufftPlanMany. The data set comes from a 3D field, stored in a 1D array, where I want to compute 1D FFTs in the x and y direction. The data is stored as shown in the figure below; continuous in x then y then z.
Doing batch FFTs in the x-direction is (I believe) straighforward; with input stride=1, distance=nx and batch=ny * nz, it computes the FFTs over elements {0,1,2,3}, {4,5,6,7}, ..., {28,29,30,31}. However, I can't think of a way to achieve the same for the FFTs in the y-direction. A batch for each xy plane is again straightforward (input stride=nx, dist=1, batch=nx results in FFTs over {0,4,8,12}, {1,5,9,13}, etc.). But with batch=nx * nz, going from {3,7,11,15} to {16,20,24,28}, the distance is larger than 1. Can this somehow be done with cufftPlanMany?

I think that the short answer to your question (possibility of using a single
cufftPlanManyto perform 1D FFTs of the columns of a 3D matrix) is NO.Indeed, transformations performed according to
cufftPlanMany, that you call likemust obey the Advanced Data Layout. In particular, 1D FFTs are worked out according to the following layout
where
baddresses theb-th signal andistrideis the distance between two consecutive items in the same signal. If the 3D matrix has dimensionsM * N * Qand if you want to perform 1D transforms along the columns, then the distance between two consecutive elements will beM, while the distance between two consecutive signals will be1. Furthermore, the number of batched executions must be set equal toM. With those parameters, you are able to cover only one slice of the 3D matrix. Indeed, if you try increasingM, then the cuFFT will start trying to compute new column-wise FFTs starting from the second row. The only solution to this problem is an iterative call tocufftExecC2Cto cover all theQslices.For the record, the following code provides a fully worked example on how performing 1D FFTs of the columns of a 3D matrix.
The situation is different for the case when you want to perform 1D transforms of the rows. In that case, the distance between two consecutive elements is
1, while the distance between two consecutive signals isM. This allows you to set a number ofN * Qtransformations and then invokingcufftExecC2Conly one time. For the record, the code below provides a full example of 1D transformations of the rows of a 3D matrix.