CUDA cuBlasGetmatrix / cublasSetMatrix fails | Explanation of arguments

369 views Asked by At

I've attempted to copy the matrix [1 2 3 4 ; 5 6 7 8 ; 9 10 11 12 ] stored in column-major format as x, by first copying it to a matrix in an NVIDIA GPU d_x using cublasSetMatrix, and then copying d_x to y using cublasGetMatrix().

#include<stdio.h>
#include"cublas_v2.h"

int main()
{
    cublasHandle_t hand;
    float x[][3] = { {1,5,9} , {2,6,10} , {3,7,11} , {4,8,12} };
    float y[4][3] = {};
    float *d_x;

    printf("X\n");
    for( int i=0 ; i<4 ; i++ )
    {
        printf("Row %i:",i+1);
        for( int j = 0 ; j<3 ; j++ )
        {
            printf(" %f",x[i][j]);
        }
        putchar('\n');
    }
    printf("Y\n");
    for( int i=0 ; i<4 ; i++ )
    {
        printf("Row %i:",i+1);
        for( int j = 0 ; j<3 ; j++ )
        {
            printf(" %f",y[i][j]);
        }
        putchar('\n');
    }

    cublasCreate( &hand );
    cudaMalloc( &d_x,sizeof(d_x) );
    cublasSetMatrix( 3,4,sizeof(float),x,3,d_x,3 );
    cublasGetMatrix( 3,4,sizeof(float),d_x,3,y,3 );

    printf("X\n");
    for( int i=0 ; i<4 ; i++ )
    {
        printf("Row %i:",i+1);
        for( int j = 0 ; j<3 ; j++ )
        {
            printf(" %f",x[i][j]);
        }
        putchar('\n');
    }
    printf("Y\n");
    for( int i=0 ; i<4 ; i++ )
    {
        printf("Row %i:",i+1);
        for( int j = 0 ; j<3 ; j++ )
        {
            printf(" %f",y[i][j]);
        }
        putchar('\n');
    }


    cudaFree( d_x );
    cublasDestroy( hand );
    return 0;
}

The output after the copy shows y filled with 0s.

Did any of the cublas function calls fail ?

Or/And

Have the wrong arguments been passed to the cublas functions ?

Also, please explain the purpose of each argument to the functions.

Using GeForce GTX 650 with CUDA 6.5 on Fedora 21 x86_64.

1

There are 1 answers

0
Robert Crovella On BEST ANSWER

The only actual problem in your code is here:

cudaMalloc( &d_x,sizeof(d_x) );

sizeof(d_x) is just the size of a pointer. You can fix it like this:

cudaMalloc( &d_x,sizeof(x) );

If you want to find out if a CUBLAS API call is failing, then you should check the return code of the API call:

cublasStatus_t res = cublasSetMatrix( 3,4,sizeof(float),x,3,d_x,3 );

Regarding a description of the parameters, you have them all correct (other than the allocation error associated with d_x). So it's not clear which one you need a description of, but they are all described in the documentation.

CUDA API calls (like cudaMalloc) also return an error code, so you should check those as well. Any time you're having trouble with a CUDA code, it's a good idea to use proper cuda error checking. You can also run your codes with cuda-memcheck as a quick test.