KafkaStreams stop consuming partitions after partition leader rebalance

512 views Asked by At

We have experimented an issue that could be caused by the parameter auto.leader.rebalance.enable, which is set to true by default on brokers.

In detail, when the automatic rebalance occurs, for example after a broker restart, some partition leaders are moved to match the preferred leader. After this event, some stateful Kafka Streams applications blocks on the source partitions whose leader has been moved and the consumer lag start to grow.

Is it a known issue? Why don't applications receive the information regarding the change of leader?

The tactical solution we found in case we need to execute a rolling restart of brokers is:

  1. Stop stateful applications
  2. Perform brokers rolling restarts.
  3. Wait 5 minutes (default value) untile the automatic leader rebalance occurs
  4. Start stateful applications.

We are using Confluent Platform Community 5.2.2, deployed on a 3 node on prem cluster.

We are trying to recreate what happened in the test environment but without success. is it possible that it is influenced by the load of the cluster, much lower in test?

Thanks in Advance! Giorgio

0

There are 0 answers