We are currently running XDP code in driver mode on the server's CPU, but we are evaluating new NICs, and we are considering running XDP code on a data processing unit (DPU). Instead of running the XDP code on the server CPU, we would run the code on the DPU's CPU. Our current XDP code (1) blocks traffic by filtering traffic using a BPF_MAP_TYPE_HASH and (2) inserts flows into a BPF_MAP_TYPE_RINGBUF, which the application picks up and processes.
From what I see, there's not many resources for running XDP code on a DPU. Is this a valid configuration? Would this bring performance benefits? Would it be possible to run the XDP code in HW offload mode on the DPU, thus making the internals of the DPU transparent? If so, would running in offload mode have all flexibility of running on the server CPUs, such as tail calls?