Kafka stream with exactly_once enabled generate several duplicated messages with (not committed transaction status)
I did a tests in my pc :
without "exactly_once" for 100_000 messages, I got 100_000 on topic target.
with props.put(PROCESSING_GUARANTEE_CONFIG, "exactly_once"); for 100_000 messages, I got 100_554 on target topic. For this last one consuming the topic target with "read_committed" allow to read only 100_000 messages. But the remaining 554 pollute the flow monitoring.
Is there a reason to have more 554 messages when activating "exactly_once" option ?
Thank you.
The 554 messages are most likely the transaction markers that are needed to provide exactly-once delivery semantics.
When you use exactly-once, Kafka Streams uses Kafka transactions to write records to the output topics. Kafka transactions use transaction markers to mark whether records were part of a committed or an aborted transaction.
A consumer with isolation level
read_commited
interprets the transaction markers to decide which records to skip because they were part of an aborted transactions and which records to return in calls topoll()
because they were part of a committed transaction.