cudaMemcpyAsync giving "invalid memory access" error

535 views Asked by At

In my employment's codebase, I'm trying to debug a "invalid memory access" error from cudaMemcpyAsync.

The function call is

CHECK_CUDA( cudaMemcpyAsync(A, B, sizeof(B), cudaMemcpyDeviceToHost, stream) )

where A and B are both int*, but B is allocated on the device with

cudaMalloc((void**) &B, sizeof(B))

When it says invalid memory access, what is it trying to access that is invalid? How can I find out what is being inapropriately accessed?

2

There are 2 answers

2
Robert Crovella On

The invalid memory access error does not actually refer to the cudaMemcpyAsync operation. So studying that alone will be unlikely to yield anything useful.

CUDA uses an asynchronous reporting mechanism to report device code execution errors "at the next opportunity" via the host API. So the error you are seeing could refer to any kernel execution that took place prior to that call.

To help localize the error, you can try specifying launch blocking when you run your code. The usefulness of this will probably depend on exactly how the code is written, and whether any sort of error checking is being done after CUDA kernel launches. If you compile your code with --lineinfo, or even if you don't, you can get additional localization information about the problem using the method indicated here.

The observation in the comment is a good one, and is perhaps an important clue to coding defects. I will note that:

  • albeit curious, as posted, the transfer size is consistent with the allocation size, so the operation itself is unlikely to be throwing an error for that reason
  • based on my experience with CUDA error reporting (i.e. familiarity with error codes and their text translations) the "invalid memory access" error is attributable to a device code execution error. If the CUDA runtime can determine that a given transfer size is inconsistent with an allocation size, the error given will be "invalid argument".

You can take a look at section 12 in this online training series to get a more in-depth treatment of CUDA error reporting, as well as debugging suggestions.

0
Anis Ladram On

I recommend against debugging a CUDA program using CUDA error codes due to the asynchronous nature of CUDA (see Robert's answer).

NVIDIA provides users with a tool called compute-sanitizer (shipped as part of the CUDA toolkit) that locates these memory issues for you. Example below:

$ cat test.cu
__global__ void kernel(int *ptr)
{
    int i = blockIdx.x * blockDim.x + threadIdx.x;

    ptr[i] = i;
}

int main(void)
{
    int* ptr;
    cudaMalloc(&ptr, 63 * sizeof(int)); // Correct size should be `64 * sizeof(int)`
    kernel<<<1, 64>>>(ptr);
}
$ nvcc -o test test.cu
$ compute-sanitizer --show-backtrace=device ./test
========= COMPUTE-SANITIZER
========= Invalid __global__ write of size 4 bytes
=========     at 0x70 in kernel(int *)
=========     by thread (63,0,0) in block (0,0,0)
=========     Address 0x7fc8efe000fc is out of bounds
=========     and is 1 bytes after the nearest allocation at 0x7fc8efe00000 of size 252 bytes
=========
========= ERROR SUMMARY: 1 error