I have the following code line, gamma
is a CPU variable, that after i will need to copy to GPU. gamma_x
and delta
are also stored on CPU. Is there any way that i can execute the following line and store its result directly on GPU? So basically, host gamma
, gamma_x
and delta
on GPU and get the output of the following line on GPU. It would speed up my code a lot for the lines after.
I tried with magma_dcopy
but so far i couldn't find a way to make it working because the output of magma_ddot
is CPU double.
gamma = -(gamma_x[i+1] + magma_ddot(i,&d_gamma_x[1],1,&(d_l2)[1],1, queue))/delta;
The very short answer is no, you can't do this, or least not if you use
magma_ddot
.However,
magma_ddot
is itself a only very thin wrapper aroundcublasDdot
, and the cublas function fully supports having the result of the operation stored in GPU memory rather than returned to the host.In theory you could do something like this:
Note that might make Magma blow up depending on how you are using it, because Magma uses CUBLAS internally and how CUBLAS state and asynchronous operations are handled inside Magma are completely undocumented. Having said that, if you are careful, it should be OK.
To then execute your calculation, either write a very simple kernel and launch it with one thread, or perhaps use a simple thrust call with a lambda expression, depending on your preference. I leave that as an exercise to the reader.