Kafka stream with exactly_once enabled generate several duplicated messages with (not committed transaction status)
I did a tests in my pc :
without "exactly_once" for 100_000 messages, I got 100_000 on topic target.
with props.put(PROCESSING_GUARANTEE_CONFIG, "exactly_once"); for 100_000 messages, I got 100_554 on target topic. For this last one consuming the topic target with "read_committed" allow to read only 100_000 messages. But the remaining 554 pollute the flow monitoring.
Is there a reason to have more 554 messages when activating "exactly_once" option ?
Thank you.
The 554 messages are most likely the transaction markers that are needed to provide exactly-once delivery semantics.
When you use exactly-once, Kafka Streams uses Kafka transactions to write records to the output topics. Kafka transactions use transaction markers to mark whether records were part of a committed or an aborted transaction.
A consumer with isolation level
read_commitedinterprets the transaction markers to decide which records to skip because they were part of an aborted transactions and which records to return in calls topoll()because they were part of a committed transaction.