I want to know the role of OS when initiating RDMA. Who initiates it OS or CPU? What happens to OS after the RDMA starts?
Remote Direct Memory Access and OS
1.2k views Asked by gpuguy AtThere are 3 answers
Your program running on the CPU along with the OS initiates the RDMA transfer. It's responsible for all the API calls that set up the memory regions that will able to be RDMA read or RDMA written. The OS is the mediator between your program and the RDMA capable hardware.
Calls made by your program are where OS comes in. Some are in kernel drivers, some in user space. There is a mix of userspace and kernel drivers.
One of the calls necessary that precedes the RDMA transfer is an OS system call to create pinned memory or memory that can't be paged out of RAM.
Another API call registers that pinned memory region with the Infiniband HBA or RDMA NIC.
And still other calls are there to set things up for the transfer and configure various parameters.
There are also other send/recv calls necessary for flow control that aren't RDMA but complete asynchronously.
Finally there is the RDMA read & write calls themselves. When those calls are running the CPU is not doing any work.
RDMA is actually fairly hard to use. I've been starting to support it in Isis2 (Isis2.codeplex.com), a system we created at Cornell for data replication, fault-tolerance and distributed consistency. Mostly one uses it on cloud platforms like EC2 but you can also configure Isis2 to run in other Linux or Windows settings, over UDP, IPMC, TCP or RDMA (currently tested only with Infiniband but we'll be testing on RDMA Ethernet shortly)
What I can say is that I have honestly never found a technology harder to work with. RDMA is more of a hardware feature than anything one would normally use directly.
My suggestion: Use RDMA from MPI (widely popular system for high performance computing) or from my Isis2 library. Don't try to use it directly.
What does "initiating RDMA" mean? Is it starting to actually read/write data with RDMA, or all the preparations that are needed to be done in order to enable RDMA?
Anyway, RDMA describes an ability of a NIC/HCA to access a memory on a remote machine through that machine's NIC/HCA w/o CPUs being involved, both on local and on remote machine. RDMA also includes DMA, which means that a network card can access a physical memory on a local machine w/o CPUs being involved.
The network card gets an order of the following type:
This operation example is called RDMA Write.
The order is given to the network card by the driver (which, in turn, can be given a request by some application). From this moment the card will do all the work w/o CPUs on both sides (that is, w/o OS).
When the command is completed, the network card on the sender side may generate an event that will be picked up by the driver (OS). On the receiver side OS will be completely unaware that RDMA Write operation has just took place. The receiver will need to either check periodically the memory at the requested address to know when the data has arrived, or have some other mechanism (there are many options, don't want to go into to much details).
There are other RDMA commands, like RDMA Read, but I think that the main idea is clear by now.
Note, however, that in order to be able to conduct an RDMA command, the driver has to prepare all the infrastructure: the from/to memory buffers have to be registered and pinned to prevent them from swapping out during the RDMA command execution, local network card has to know the remote's machine memory key, etc. All these preparations are done by the driver on both machines.