Does Kafka consumer reads the message from active segment in the partition?

735 views Asked by At

Let us say I have a partition (partition-0) with 4 segments that are committed and are eligible for compaction. So all these segments will not have any duplicate data since the compaction is done on all the 4 segments.

Now, there is an active segment which is still not closed. Meanwhile, if the consumer starts reading the data from the partition-0, does it also read the messages from active segment?

Note: My goal is to not provide duplicate data to the consumer for a particular key.

1

There are 1 answers

0
Michael Heil On BEST ANSWER

Your concerns are valid as the Consumer will also read the messages from the active segment. Log compaction does not guarantee that you have exactly one value for a particular key, but rather at least one.

Here is how Log Compaction is introduced in the documentation:

Log compaction ensures that Kafka will always retain at least the last known value for each message key within the log of data for a single topic partition.

However, you can try to get the compaction running more frequently to have your active and non-compated segment as small as possible. This, however, comes at a cost as running the compaction log cleaner takes up ressources.

There are a lot of configurations at topic level that are related to the log compaction. Here are the most important and all details can be looked-up here:

  • delete.retention.ms
  • max.compaction.lag.ms
  • min.cleanable.dirty.ratio
  • min.compaction.lag.ms
  • segment.bytes

However, I am quite convinced that you will not be able to guarantee that your consumer is never getting any duplicates with a log compacted topic.