I'm playing around with cuBlas, trying to get a dot product and a matrix-vector product to work. While doing so, I've come across a problem. First of, the code:
float result_1;
cublasSdot_v2(handle, c_nrSV[0] + 1, d_results[0], 1, d_ones, 1, &result_1);
c_nrSV is an integer vector, d_results an array in host memory containing pointers to cublas vectors in device memory and d_ones is a pointer to a cublas vector in device memory. With cublas vector I mean its value has been set with cublasSetVector().
This runs without any problem. The value of result_1 resides in host memory and the result of the dot product gets copied into there, if I understood this correctly. Since I want to further use the results of the dot product, I would rather have them in GPU memory. The cuBLAS documentation states that the result can be in either host or device memory. So I try the following:
float* result_2;
cudaMalloc((void**)&result_2, sizeof(float));
cublasSdot_v2(handle, c_nrSV[0] + 1, d_results[0], 1, d_ones, 1, result_2);
This crashes with the error: "Access violation writing location 0x0000000701040C00". I'm not quite sure what is going on. I have the same issue with cublasSgemv:
float alpha = 1;
float beta = 0;
cublasSgemv_v2(handle, CUBLAS_OP_N, c_nrSV[i], nrFeatures, &alpha, d_svms[i], c_nrSV[i], d_fvec, 1, &beta, d_results[i], 1)
This runs without any issue. The documentation states that alpha and beta can be in GPU memory. But if I declare alpha and beta into device memory and initialize them like so:
float h_alpha = 1;
float h_beta = 0;
float* alpha;
float* beta;
cudaMalloc((void**)&alpha, sizeof(float));
cudaMalloc((void**)&beta, sizeof(float));
cudaMemcpy(alpha, &h_alpha, sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(beta, &h_beta, sizeof(float), cudaMemcpyHostToDevice);
cublasSgemv_v2(handle, CUBLAS_OP_N, c_nrSV[i], nrFeatures, alpha, d_svms[i], c_nrSV[i], d_fvec, 1, beta, d_results[i], 1);
I get the same error: "Access violation reading location 0x0000000701040E00."
What is going on? Do I have to specify that the memory is in the device and not on the host?