cudaMemcpyAsync giving "invalid memory access" error

Question

cudaMemcpyAsync giving "invalid memory access" error

535 views Asked by roulette01 At 06 October 2022 at 17:17

In my employment's codebase, I'm trying to debug a "invalid memory access" error from cudaMemcpyAsync.

The function call is

CHECK_CUDA( cudaMemcpyAsync(A, B, sizeof(B), cudaMemcpyDeviceToHost, stream) )

where A and B are both int*, but B is allocated on the device with

cudaMalloc((void**) &B, sizeof(B))

When it says invalid memory access, what is it trying to access that is invalid? How can I find out what is being inapropriately accessed?

Original Q&A

There are 2 answers

**Robert Crovella** · Answer 1 · 2022-10-06T19:04:30+00:00

The invalid memory access error does not actually refer to the cudaMemcpyAsync operation. So studying that alone will be unlikely to yield anything useful.

CUDA uses an asynchronous reporting mechanism to report device code execution errors "at the next opportunity" via the host API. So the error you are seeing could refer to any kernel execution that took place prior to that call.

To help localize the error, you can try specifying launch blocking when you run your code. The usefulness of this will probably depend on exactly how the code is written, and whether any sort of error checking is being done after CUDA kernel launches. If you compile your code with --lineinfo, or even if you don't, you can get additional localization information about the problem using the method indicated here.

The observation in the comment is a good one, and is perhaps an important clue to coding defects. I will note that:

albeit curious, as posted, the transfer size is consistent with the allocation size, so the operation itself is unlikely to be throwing an error for that reason
based on my experience with CUDA error reporting (i.e. familiarity with error codes and their text translations) the "invalid memory access" error is attributable to a device code execution error. If the CUDA runtime can determine that a given transfer size is inconsistent with an allocation size, the error given will be "invalid argument".

You can take a look at section 12 in this online training series to get a more in-depth treatment of CUDA error reporting, as well as debugging suggestions.

**Anis Ladram** · Answer 2 · 2022-10-08T19:51:47+00:00

I recommend against debugging a CUDA program using CUDA error codes due to the asynchronous nature of CUDA (see Robert's answer).

NVIDIA provides users with a tool called compute-sanitizer (shipped as part of the CUDA toolkit) that locates these memory issues for you. Example below:

$ cat test.cu
__global__ void kernel(int *ptr)
{
    int i = blockIdx.x * blockDim.x + threadIdx.x;

    ptr[i] = i;
}

int main(void)
{
    int* ptr;
    cudaMalloc(&ptr, 63 * sizeof(int)); // Correct size should be `64 * sizeof(int)`
    kernel<<<1, 64>>>(ptr);
}
$ nvcc -o test test.cu
$ compute-sanitizer --show-backtrace=device ./test
========= COMPUTE-SANITIZER
========= Invalid __global__ write of size 4 bytes
=========     at 0x70 in kernel(int *)
=========     by thread (63,0,0) in block (0,0,0)
=========     Address 0x7fc8efe000fc is out of bounds
=========     and is 1 bytes after the nearest allocation at 0x7fc8efe00000 of size 252 bytes
=========
========= ERROR SUMMARY: 1 error

TechQA.

cudaMemcpyAsync giving "invalid memory access" error

There are 2 answers

Related Questions in C++

Related Questions in CUDA

Related Questions in MEMORY-ACCESS

Popular Questions

Trending Questions