RDMA Scatter/Gather is a nice way to consolidate data transfers. For example, verbs API allows data at multiple locations to be written in a remote buffer with a SINGLE RDMA write operation; or, data in a remote buffer could be read to multiple locations with a SINGLE RDMA read operation.
However, I can not initiate an RDMA operation writing to multiple locations on the remote side (or reading from multiple locations on the remote side). This feature is appealing to us because it efficiently uses the wide RDMA lanes for multiple small writes. I also checked the Intel qsm APIs and the Cray gni APIs. It seems no one support such a feature--let's call it "writer-controlled remote scatter". Is there a deep reason this is not supported?
I do not have a good explanation for why the verbs interface does not support it, as it can be definitely implemented in hardware.
However, there are at least two ways to do this more efficiently: 1. Easier way - you can post a list of RDMA requests at once for multiple remote locations and request a completion entry only for the last one - this will provide better performance than posting them one by one. 2. More advanced: you can create a "UMR" on the remote host, that will group all of those locations into one contiguous virtual MR, then you can use that remote virtual MR with a single post operation