q1- lets say i have copy one array onto device through stream1 using cudaMemCpyAsync; would i be able to access the values of that array in different stream say 2?
cudaMemcpyAsync(da,a,10*sizeof(float),cudaMemcpyHostToDevice,stream[0]);
kernel<<<n,1,0,stream[0]>>>(da);
kernel<<<n,1,0,stream[1]>>>(da){//calculation involving da} ;
q2- would i have to include pointer to global memory array as argument in kernel call? can't i directly access it from the kernel?
__global__ void kernel() {out[threadidX.x]=2*input[threadIdx.x]; }
instead of
__global__ void kernel(float * out,float input){out[threadidX.x]=2*input[threadIdx.x]; }
q3- lets say i have all malloc and memcpy in one function and kernel call in another function and cuda free in another when i call these function they gave invalid argument error?
Yes, the array
da
is accessible in both kernels you have shown. However, an important question is whether or not the previouscudaMemcpyAsync
operation is complete (or guaranteed to be complete):in the case of the first kernel launch above, the
cudaMemcpyAsync
operation is guaranteed to be complete. In the case of the second, it is not. Operations issued to the same stream are guaranteed to be serialized, i.e. executed in issue-order. This guarantees that thecudaMemcpyAsync
operation will be complete before the kernel launched to the same stream. CUDA activities issued to separate streams may overlap, so there is no guaranteed that the kernel issued to a different stream than the precedingcudaMemcpyAsync
operation will wait until that operation is complete.This appears to be a completely separate question. Dynamically allocated global memory arrays (i.e. those allocated with
cudaMalloc
) will require that you pass a pointer to the allocation to the kernel, if you want to use that data in a kernel. However a statically allocated array:has file scope and you need not (and should not) pass the
data
pointer as a kernel parameter. Thatdata
pointer can be used directly in a kernel without explicitly passing it as a kernel parameter. (Note that you don't usecudaMemcpy
to populate such an array.)I don't think this question can be answered unless you provide an MCVE. SO expects that you provide an MCVE for "1.Questions seeking debugging help ("why isn't this code working?")"