I am trying to create a replayable event stream using kafka. I understand that in standard usage the compaction is designed to clean up any duplicate keys within the log [source], and also keys that have an empty value. However, in my topic, I'd like to continue having duplicate keys. For example, the below partition:
After compaction:
Continuing publishing:
I believe I can use the following configurations to achieve some version of this, where we at least contain 2 weeks of data in the topic, and anything before will be compacted to only have the latest state.
Topic retention policy to be unlimited (retention.ms=-1)
Cleanup policy to be compaction (log.cleanup.policy=compact)
Enable cleaner (log.cleaner.enable=true)
Oldest possible age of a message on segment to be 1 days (segment.ms=86400000)
Do not hold any tombstone keys in the log (log.cleaner.min.compaction.lag.ms=0)
Do not clean a segment until all the messages are > two weeks old (min.compaction.lag.ms=1209600000)
However the optimal version of my topic would compact the topic every night, only purging keys that have tombstone values (K2 in the example), leaving everything else, so that a new consumer joins and replays from 0 and gets all (non-purged) updates for each key.
So, I'd like to ask if anyone has implemented compaction this way, and possible suggestions on fine tuning the policy to allow the longest retention of non-tombstoned keys.