CUDA cudaMemcpyAsync using single stream to host

Question

CUDA cudaMemcpyAsync using single stream to host

181 views Asked by Yona At 07 February 2021 at 13:59

I have a single kernel which is feeling data to two parameters (dev_out_1 and dev_out_2) using single stream. I wanted to copy back the data from the device to host in parallel. my requirement is to use single stream and copy back to the host in parallel.

How do you manage this kind of issues ?

SomeCudaCall<<<25,34>>>(input, dev_out_1,dev_out_2);
cudaMemcpyAsync(toHere_1, dev_out_1, sizeof(int), cudaMemcpyDeviceToHost,0);
cudaMemcpyAsync(toHere_2, dev_out_2, sizeof(int), cudaMemcpyDeviceToHost,0);

Original Q&A

There are 1 answers

**talonmies** · Accepted Answer · 2021-02-08T08:21:06+00:00

I wanted to copy back the data from the device to host in parallel

That is not possible.

NVIDIA GPUs can only use one DMA engine for device to host transfers (even in the case where there are more than one DMA engine), and the DMA engine can only perform a single transfer at a time. So "parallel" copies in the same direction over the PCI express bus are not possible.

TechQA.

CUDA cudaMemcpyAsync using single stream to host

There are 1 answers

Related Questions in CUDA

Related Questions in CUDA-STREAMS

Popular Questions

Popular Tags

Trending Questions