I would like some clarification about the unified memory, how it really works and how to effectively use it.
As far as I know, we use cudaMallocManaged(ptr, size);
to allocate an array of unified memory. Since the Pascal architecture it's possible to set the size greater than the physical memory available on the GPU.
Suppose now I have a GC with 4GB of RAM, 32GB of RAM for the host and a file of 1TB. I want to address this 1TB file, how do I process ?
If I understand well, I can fit the file in the unified memory, but how does the link between this unified array and the file is performed ? Does it mean I have to memcpy
the whole file in the pointer I have allocated with cudaMallocManaged
?
Finally, tell me if I'm right. If a miss is raised by the GPU the CPU will send the data it stored in its RAM and if not from the disk. It's a bit simplified but if it works like this it means the data needs to be in the unified array.
Thank you for your help.
My response assumes you are running CUDA 9.x or higher, a Pascal or Volta GPU, on Linux.
You will be able to oversubscribe GPU memory up to approximately the size of host memory (i.e. whatever the host operating system allows you to allocate), less some reasonable amount that would be typical in any memory allocation process (you should not be expecting to allocate every last byte of host memory, and likewise should not attempt to do the same with managed memory allocations).
There is no link between unified memory and files or anything stored on disk.
Just as you probably cannot load that entire 1TB file into your 32GB of RAM, you cannot access it all at once using managed memory. Whatever amount the host operating system will allow you to allocate/load, is the size you will have available to the GPU.
Therefore, in order to process that 1TB file, you would probably need to come up with an algorithm that breaks it into pieces that fit in system RAM. This concept is completely independent of managed memory. Thereafter, if you want to access the piece of your file that is in system RAM using CUDA, you can use managed memory, including oversubscription, if you wish, to do so.
The exact process of breaking your file into pieces will depend on exactly what sort of processing you are doing, and has no particular dependency on CUDA.