I am doing a batch execution of high number of 3x3 matrices with CUDA.

The goal is to get a big matrix of 3x3 matrix (so I use a 4D array).

I have done previously the same operation with numpy.linalg.inv function. With this way, I can directly get an array of 3x3 matrix : I show you the code that performs this operation.

Now, with CUDA version, I would like to reshape in a minimum of instructions the big 1D array produced : so I have to build a (N,N,3,3) array from a (N*N*3*3) 1D array.

For the moment, I can do this reshape into 2 steps (here the code below).

The original version with classical numpy.linalg.inv is carried out by :

for r_p in range(N):
  for s_p in range(N):
    # original version (without GPU)
    invCrossMatrix[:,:,r_p,s_p] = np.linalg.inv(arrayFullCross_vec[:,:,r_p,s_p])

invCrossMatrix represents a (3,3,N,N) array and I get it directly from the (3,3,N,N) arrayFullCross array (dimBlocks = 3)

For the moment, when I use GPU batch execution, I start from the 1D array :

    # Declaration of inverse cross matrix
    invCrossMatrix_temp = np.zeros((N**2,3,3))

    # Create arrayFullCross_vec array
    arrayFullCross_vec = np.zeros((3,3,N,N))

    # Create arrayFullCross_vec array
    invCrossMatrix_gpu = np.zeros((3*3*(N**2)))

    # Build observables covariance matrix
    arrayFullCross_vec = buildObsCovarianceMatrix3_vec(k_ref, mu_ref, ir)

    ## Performing batch inversion 3x3 :
    invCrossMatrix_gpu = gpuinv3x3(arrayFullCross_vec.flatten('F'),N**2)

    ## First reshape
    invCrossMatrix_temp = invCrossMatrix_gpu.reshape(N**2,3,3)
    # Second reshape : don't forget ".T" transpose operator
    invCrossMatrix = (invCrossMatrix_temp.reshape(N,N,3,3)).T

Question 1): By the way, why the -F option into flatten('F') is necessary ?

if I do only : gpuinv3x3(arrayFullCross_vec.flatten,N**2), the code doesn't work : Python is maybe column major like Fortran ?

Question 2) Now, I would like to convert the following block :

## First reshape
invCrossMatrix_temp = invCrossMatrix_gpu.reshape(N**2,3,3)
# Second reshape : don't forget ".T" transpose operator
invCrossMatrix = (invCrossMatrix_temp.reshape(N,N,3,3)).T

into a single reshape instruction : is it possible ?

The issue is about to convert the 1D array invCrossMatrix_gpu(N**2 * 3 *3) directly into a (3,3,N,N) array.

I expect to reshape the original 1D array in one only time since I call these routines a lot of times.

UPDATE 1: Is to right to say that array inVCrossMatrix defined by :

invCrossMatrix = (invCrossMatrix_temp.reshape(N,N,3,3)).T

has dimensions (3,3,N,N).

@hpaulj : Is it equivalent to do :

 invCrossMatrix =(invCrossMatrix_temp.reshape(N,N,3,3)).transpose(2,3,0,1) 

?? Regards

0 Answers