matrix multiplication using cuBLAS on alea gpu

384 views Asked by At

I'm trying to use Gemm for matrix multiplication on Alea GPU, however, this code gives the wrong result.

Gpu gpu = Gpu.Default;
Blas blas = new Blas(gpu);

int m=2,n=3;    //in dimension and out dimension (output will be mxn matrix)
int k=4;

//column major
float[,] A = new float[4,2] { {100,200},{2,6},{3,7},{4,8} };    //2x4 matrix
float[,] B = new float[3,4] { {1,4,7,10}, {2,5,8,11}, {3,6,9,12} }; //4x3 matrix
float[,] C = new float[3,2] { {-1,-1}, {-1,-1}, {-1,-1}  }; //2x3 matrix

var dA = gpu.AllocateDevice<float>(A);  
var dB = gpu.AllocateDevice<float>(B);  
var dC = gpu.AllocateDevice<float>(C);

blas.Gemm(Operation.N,Operation.N,m,n,k,1f,dA.Ptr,m,dB.Ptr,k,0f,dC.Ptr,m);

var result = Gpu.Copy2DToHost(dC);

This is the result I get. It just copies some number from matrix A. Some numbers in matrix C do not change from the initialization.

100 -1 -1
200 -1 -1

Is there anything wrong with the code? Please help.

I'm using alea 3.0.3 with cuda toolkit 8.0.

UPDATE1: I've found that it gives correct result when I flatten A,B,C matrices to 1D-arrays. However, still want to know what's wrong with 2D-arrays.

1

There are 1 answers

1
koonyook On

I've found that gpu.AllocateDevice for 2D-Array does not allocate the space on GPU as it is on CPU. The distance between the first elements of any 2 consecutive columns (pitch) is surprisingly large.

Therefore, the leading dimension parameter must be changed.

blas.Gemm(Operation.N,Operation.N,m,n,k,1f,dA.Ptr,dA.PitchInElements.ToInt32(),dB.Ptr,dB.PitchInElements.ToInt32(),0f,dC.Ptr,dC.PitchInElements.ToInt32());

Now, I got the correct result. However, is there any documents showing the details of how the allocation of 2D-array on GPU really works in Alea?

I can only see http://www.aleagpu.com/release/3_0_3/api/html/6f0dc687-7191-91ba-6c30-bb379dded567.htm which has no explanation.