I have the following code line,
gamma is a CPU variable, that after i will need to copy to GPU.
delta are also stored on CPU. Is there any way that i can execute the following line and store its result directly on GPU? So basically, host
delta on GPU and get the output of the following line on GPU. It would speed up my code a lot for the lines after.
I tried with
magma_dcopy but so far i couldn't find a way to make it working because the output of
magma_ddot is CPU double.
gamma = -(gamma_x[i+1] + magma_ddot(i,&d_gamma_x,1,&(d_l2),1, queue))/delta;