direct global memory access using cuda

321 views Asked by At

q1- lets say i have copy one array onto device through stream1 using cudaMemCpyAsync; would i be able to access the values of that array in different stream say 2?

cudaMemcpyAsync(da,a,10*sizeof(float),cudaMemcpyHostToDevice,stream[0]);
kernel<<<n,1,0,stream[0]>>>(da);
kernel<<<n,1,0,stream[1]>>>(da){//calculation involving da} ;

q2- would i have to include pointer to global memory array as argument in kernel call? can't i directly access it from the kernel?

__global__ void kernel() {out[threadidX.x]=2*input[threadIdx.x];  }

instead of

__global__ void kernel(float * out,float input){out[threadidX.x]=2*input[threadIdx.x];  }

q3- lets say i have all malloc and memcpy in one function and kernel call in another function and cuda free in another when i call these function they gave invalid argument error?

1

There are 1 answers

0
Robert Crovella On BEST ANSWER

q1- lets say i have copy one array onto device through stream1 using cudaMemCpyAsync; would i be able to access the values of that array in different stream say 2?

Yes, the array da is accessible in both kernels you have shown. However, an important question is whether or not the previous cudaMemcpyAsync operation is complete (or guaranteed to be complete):

cudaMemcpyAsync(da,a,10*sizeof(float),cudaMemcpyHostToDevice,stream[0]);
kernel<<<n,1,0,stream[0]>>>(da);
kernel<<<n,1,0,stream[1]>>>(da){//calculation involving da} ;

in the case of the first kernel launch above, the cudaMemcpyAsync operation is guaranteed to be complete. In the case of the second, it is not. Operations issued to the same stream are guaranteed to be serialized, i.e. executed in issue-order. This guarantees that the cudaMemcpyAsync operation will be complete before the kernel launched to the same stream. CUDA activities issued to separate streams may overlap, so there is no guaranteed that the kernel issued to a different stream than the preceding cudaMemcpyAsync operation will wait until that operation is complete.

q2- would i have to include pointer to global memory array as argument in kernel call? can't i directly access it from the kernel?

This appears to be a completely separate question. Dynamically allocated global memory arrays (i.e. those allocated with cudaMalloc) will require that you pass a pointer to the allocation to the kernel, if you want to use that data in a kernel. However a statically allocated array:

__device__ int data[SIZE];

has file scope and you need not (and should not) pass the data pointer as a kernel parameter. That data pointer can be used directly in a kernel without explicitly passing it as a kernel parameter. (Note that you don't use cudaMemcpy to populate such an array.)

q3- lets say i have all malloc and memcpy in one function and kernel call in another function and cuda free in another when i call these function they gave invalid argument error?

I don't think this question can be answered unless you provide an MCVE. SO expects that you provide an MCVE for "1.Questions seeking debugging help ("why isn't this code working?")"