How to Synchronize with Julia CUDArt?

326 views Asked by At

I'm just starting to use Julia's CUDArt package to manage GPU computing. I am wondering how to ensure that if I go to pull data from the gpu (e.g. using to_host()) that I don't do so before all of the necessary computations have been performed on it.

Through some experimentation, it seems that to_host(CudaArray) will lag while the particular CudaArray is being updated. So, perhaps just using this is enough to ensure safety? But it seems a bit chancy.

Right now, I am using the launch() function to run my kernels, as depicted in the package documentation.

The CUDArt documentation gives an example using Julia's @sync macro, which seems like it could be lovely. But for the purposes of @sync I am done with my "work" and ready to move on as soon as the kernel gets launched with launch(), not once it finishes. As far as I understand the operation of launch() - there isn't a way to change this feature (e.g. to make it wait to receive the output of the function it "launches").

How can I accomplish such synchronization?

2

There are 2 answers

1
Chris Rackauckas On BEST ANSWER

I think the more canonical way is to make a stream for each device:

streams = [(device(dev); Stream()) for dev in devlist]

and then inside the @async block, after you tell it to do the computations, you use the wait(stream) function to tell it to wait for that stream to finish its computations. See the Streams example in the README.

1
Michael Ohlrogge On

Ok, so, there isn't a ton of documentation on the CUDArt package, but I looked at the source code and I think it looks straightforward on how to do this. In particular, it appears that there is a device_synchronize() function that will block until all of the work on the currently active device has finished. Thus, the following in particular seems to work:

using CUDArt
md = CuModule("/path/to/module.ptx",false)
MyFunc = CuFunction(md,"MyFunc")
GridDim = 2*2496
BlockDim = 64
launch(MyFunc, GridDim, BlockDim, (arg1, arg2, ...)); 
device_synchronize()
res = to_host(arg2)

I'd love to hear from anyone with more expertise though if there is anything more to be aware of here.