How does migrated kafka consumer behave after repartitioning while cluster migration?

88 views Asked by At

I have couple of conceptual doubts on the behaviour of consumers.

Cluster migration gives great opportunity while repartition topics which are over partitioned or under partitioned in current cluster.

  1. So in this case of repartitioning in destination cluster, offset to records mapping is not going to be same in the destination cluster as of the source. Does it mean consumer cannot resume the job from target cluster from where it is left in source cluster as __consumer-offsets will not have same mapping?

  2. Replicator uses offset translation using timestamp, does that help in this case.? If yes, replicator’s translation can’t be used for clients other than Java clients. How can I manage this scenario for other than Java clients?

  3. I am not sure if cluster linking allows you to repartition as it is a byte-to-byte copy. Can you please help me understand this as well? Does MM2 have any configurations to cover this?

Please help me understand this, thank you.

1

There are 1 answers

1
ChristDist On

Not sure about the migration strategy you are after, but you can correlate with the following. If you are migrating from one infra to another infra

Set auto.offset.reset = earliest on the consumer side, so that all newly produced events will be picked from Offset 0 of all respective partitions. Check mark should be made, where you move your Producer first and wait for all the messages drained out by the existing consumers and then move your consumers pointing to the new cluster.

And clarifying your concerns, from the documentation

  1. MirrorMaker contains a utility class RemoteClusterUtils to enable consumers to seek to the last checkpointed offset in a DR cluster with offset translation when failing over from a primary cluster. Support for periodic migration of consumer offsets was added in 2.7.0 to automatically commit translated offsets to the target __consumer_offsets topic so that consumers switching to a DR cluster can restart from where they left off in the primary cluster with no data loss and minimal duplicate processing. Consumer groups for which offsets are migrated can be customized, and for added protection, MirrorMaker does not overwrite offsets if consumers on the target cluster are actively using the target consumer group, thus avoiding any accidental conflicts.

  2. Mirror‐Maker, uses a Kafka topic for storing offset translation metadata. Offsets are stored whenever the difference between the two offsets changes. For example, if offset 495 on the primary mapped to offset 500 on the DR cluster, we’ll record (495,500) in the external store or offset translation topic. If the difference changes later due to duplicates and offset 596 is mapped to 600, then we’ll record the new mapping (596,600). There is no need to store all the offset mappings between 495 and 596; we just assume that the difference remains the same and so offset 550 in the primary cluster will map to 555 in the DR. Then when failover occurs, instead of mapping timestamps (which are always a bit inaccurate) to offsets, we map primary offsets to DR offsets and use those. One of the two techniques listed previously can be used to force consumers to start using the new offsets from the mapping.

  3. No. MirrorMaker periodically checks for new topics in the source cluster and starts mirroring these topics automatically if they match the configured patterns. If more partitions are added to the source topic, the same number of partitions is automatically added to the target topic, ensuring that events in the source topic appear in the same partitions in the same order in the target topic.