Kafka duplicate consumption expected as a result of rebalancing?

273 views Asked by At

This is a conceptual description of a Kafka duplicate consumption scenario that we are seeing in our load test environment:

  • GIVEN Two application instances on separate EC2 instances with one Kafka consumer each and same consumer group ID.
  • GIVEN Consumption with a Spring BatchAcknowledgingMessageListener which synchronously consumes a batch and then calls Acknowledgment.acknowledge().
  • GIVEN single-threaded ConcurrentMessageListenerContainer with AckMode MANUAL_IMMEDIATE and syncCommits enabled.
  • WHEN One of the application instances shuts down (its consumer leaves the consumer group)
  • THEN The consumer group is rebalanced so that the remaining consumer gets all of the partitions.
  • WHEN The first application instance starts again (its consumer joins the consumer group).
  • THEN The consumer group is rebalanced so that each consumer gets half of the partitions.
  • THEN A handful of messages are successfully consumed by both consumers.
  • THEN The consumer that gets half of its partitions revoked logs one or more of this type of message before the partition revocation is logged: 2022-11-17 08:24:21,809 INFO [consumer-0-C-1] o.a.k.c.c.i.ConsumerCoordinator [ConsumerCoordinator.java:1156] [Consumer clientId=consumer-xms-batch-mt-callback-3, groupId=xms-batch-mt-callback] Failing OffsetCommit request since the consumer is not part of an active group

This happens sometimes when executing a test case with application instance restart during moderate traffic load (consumption of 1000 records per second). By contrast, when there is no duplication, the partition revocation is logged first and only after that there are log entries for failing offset commits.

Is duplicate consumption expected behaviour in this use case? I would think not, and I have a hard time understanding how it's even possible with a synchronous consumer. My understanding is that the listener thread always does polling and consumption sequentially, and that partition revocation happens during polling.

We are using kafka-clients version 3.1.1 and spring-kafka version 2.8.6, and I haven't found any known defect that explains the duplication.

We started off with AckMode MANUAL, then changed to MANUAL_IMMEDIATE in case it could be tied to deferred commits.

1

There are 1 answers

0
Mikael Carlstedt On

Reported as a spring-kafka defect, fix will be included in release 3.0.1.