Intermittent "Node X disconnected" issue for Kafka despite batch.size , linger.ms change

118 views Asked by At

I am trying to consume/read/receive and publish/write/send data from/to Kafka without intermittent "Node X disconnected" issue.

With this reactive Kafka consumer and producer config:

public KafkaReceiver<String, String> kafkaReceiver(final MeterRegistry registry, final ObservationRegistry observationRegistry) {
        final Map<String, Object> properties = new HashMap<>();
        properties.put(SSL_PROTOCOL, SSL_VALUE);
        properties.put(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG, keyStoreLocation);
        properties.put(SslConfigs.SSL_KEYSTORE_PASSWORD_CONFIG, keyStorePassphrase);
        properties.put(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG, trustStoreLocation);
        properties.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, trustStorePassphrase);
        properties.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-1.com:9092,kafka-2.com:9092",kafka-3.com:9092);
        properties.put(ConsumerConfig.CLIENT_ID_CONFIG, consumerGroup);
        properties.put(ConsumerConfig.GROUP_ID_CONFIG, consumerGroup);
        properties.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, startOffset);
public KafkaSender<String, Log> kafkaSender(final MeterRegistry registry, final ObservationRegistry observationRegistry) {
        final Map<String, Object> properties = new HashMap<>();
        properties.put(SSL_PROTOCOL, SSL_VALUE);
        properties.put(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG, keyStoreLocation);
        properties.put(SslConfigs.SSL_KEYSTORE_PASSWORD_CONFIG, keyStorePassphrase);
        properties.put(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG, trustStoreLocation);
        properties.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, trustStorePassphrase);
        properties.put(ProducerConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-1.com:9092,kafka-2.com:9092",kafka-3.com:9092);

I am just doing some very simple consuming of data and sending the data to another topic.

In both scenarios where load is gentle, all the way to high, I am always seeing this issue:

 1 INFO --- [reactive-kafka] [aaa,,] o.apache.kafka.clients.NetworkClient : [Consumer clientId=aaa-] Node -1 disconnected.
 1 INFO --- [kafka-producer-network-thread | producer-1] [aaa,,] o.apache.kafka.clients.NetworkClient : [Producer clientId=producer-1] Node -1 disconnected.
 1 INFO --- [kafka-producer-network-thread | producer-1] [aaa,,] o.apache.kafka.clients.NetworkClient : [Producer clientId=producer-1] Node 5 disconnected.

The numbers are always like -1, and some other numbers

After reading the doc, I believe the most appropriate properties to tune are batch.size and linger.ms.

After trying several many values, the issue still persist.

The issue is from my app only, I do not believe they are from Kafka. Because at a time T, when my app is facing the issue, there are hundreds of other apps interacting with the same Kafka. If the Kafka server infrastructure is the issue, I would have expected to see the same logs in at least more than one app, however, no, only me.

Are those two properties not suited to resolve this issue?

How to not have those issues?

0

There are 0 answers