Question 1)
When I call CUDA driver API, usually I need first push the context (which represents a GPU runtime) to current thread. For normal cuMalloc
, the memory will be allocated on that GPU specified by the context. But if I try to call cuMallocManaged
to create unified memory, do I still need to push a GPU context?
Question 2)
Say I have 2 GPUs, each has 1 GB DRAM. So can I allocate unified memory of 2 GB? with each GPU holds half of it?
Follow established driver API programming methods. Explicitly establish a CUDA context.
No, this is not how managed memory works. A managed allocation is visible, in its entirety, to all GPUs in the system. This is true whether we are talking about a pre-pascal UM regime or a pure-pascal UM regime, although the specific method of visibility varies. Refer to the programming guide sections on UM with multi-GPU.