KafkaDirect I'm attempting to install KafkaDirect from the GitHub repository to enable RDMA communication in Kafka.
My environment is as follows:
Ubuntu 20.04 Cluster : Node1, Node2, Node3 Mellanox ConnectX-3 InfiniBand
KafkaDirect is an adaptation of the Java-based RDMA API called DiSNI, customized for integration with Kafka. I have completed the installation process as outlined in the ReadMe, including the installation of ktaranov's DiSNI and KafkaDirect. TaranovK DiSNI
Subsequently, with Zookeeper and Kafka Cluster running on three nodes, two issues have arisen:
The first issue is that when comparing the performance using the benchmarking tool mentioned in KafkaDirect's readme, the performance without RDMA was significantly faster than with RDMA.
The second issue is that when setting the replication factor of Kafka topic partitions to more than 2 and producing data, there is no replication happening between the brokers at all.
During the process of sending data to the Kafka topic from the producer application or benchmarking tool, I observed traffic in the InfiniBand RDMA monitoring tool(collectl). However, replication between the brokers is not occurring at all. (The configuration for the cluster setup of three nodes in both the ZooKeeper and Kafka configuration files has been completed.)
Benchmarking tool without RDMA
Without using RDMA, it takes around 4 seconds, whereas with RDMA, it takes as long as 55 seconds.
The replication between brokers is not happening, leading to followers being removed from the ISR (In-Sync Replicas) list.
Since traffic is occurring during the production process, indicating that RDMA communication seems to be working, I would appreciate any brief insights you can provide regarding potential reasons for these issues. Thank you.