I want to do a LU decomposition of a square matrix A by using scikit-cuda and pycuda. I tried a few demo code from the scikit-cuda github website and all seemed fine.
However, when I tried to call the low-level interface cublas.cublasDgetrfBatched in my little example, I failed with the error code below:
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
Here is my little python code
import numpy as np
import pycuda.autoinit
import skcuda.cublas as cublas
import pycuda.gpuarray as gpuarray
N = 10
N_BATCH = 1 # only 1 matrix to be decomposed
A_SHAPE = (N, N)
a = np.random.rand(*A_SHAPE).astype(np.float64)
a_batch = np.expand_dims(a, axis=0)
a_gpu = gpuarray.to_gpu(a_batch.T.copy()) # transpose a to follow "F" order
p_gpu = gpuarray.zeros(N * N_BATCH, np.int32)
info_gpu = gpuarray.zeros(N_BATCH, np.int32)
cublas_handle = cublas.cublasCreate()
cublas.cublasDgetrfBatched(
cublas_handle,
N,
a_gpu.gpudata,
N,
p_gpu.gpudata,
info_gpu.gpudata,
N_BATCH,
)
cublas.cublasDestroy(cublas_handle)
print(a_gpu)
I am a novice user of scikit-cuda. So, could someone give me a hand?
Like @talonmies commented, a pointer to an array of address of matrix on the device should be used.