Kafka Blue/Green deployment, how to seamlessly migrate consumers

237 views Asked by At

We are currently experiencing challenges while implementing the Blue/Green deployment strategy for our services.

Our service architecture consists of several micro-services and a Kafka queue, with external producers and consumers. The decision to adopt the Blue/Green strategy was made in order to seamlessly handle any breaking changes that may occur. However, we have encountered an obstacle.

The issue arises when utilizing the traffic manager to connect to our Kafka service. Currently, consumers are successfully connecting to Kafka using the traffic manager URL. When we attempt to switch to the new cluster, we simply need to modify the traffic manager settings, resulting in an updated URL for the Kafka server.

Unfortunately, despite this URL update, the consumers continue to consume data from the old cluster. We are seeking guidance on how to enforce the consumers to begin consuming data from the new cluster.

Are there any settings that can help us here?

TIA.

2

There are 2 answers

2
OneCricketeer On

I don't quite understand what "traffic manager" means here since that's not part of the Kafka protocol, but

  1. The brokers return advertised.listeners to clients, despite you telling them to use "traffic manager" to bootstrap. This means clients will have direct knowledge of the (original) brokers after they are connected.
  2. Clients have a 5 minute window for metadata.max.age.ms to find new brokers and partitions, however this assumes the same cluster.

Sounds like you need to stop/close the consumers, and reconnect through your "traffic manager" so that a new set of bootstrap servers would be returned.

In general, I've found Kafka consumers don't work in a blue-green environment because of shared consumer groups and both sides will be consuming data, potentially with different processing logic.

0
Steephen On

In short, you can't do this way. Let me try to explain; you are trying to switch the network traffic and hope everything will work smoothly.

I am totally ignorant of TrafficManager, and I am considering it as a black box you are relying to switch the traffic and proxying the network.

Producer uses the IP address provided by your TrafficManager and creating the connection to brokers. And when it produces a record to Kafka; along with every record, Kafka add additional metadata and one of them is offset. When consumer consume data from kafka, consumer will use this offset and map it against consumer group and create a new record in a system topic called __consumer_offset. When you switch network traffic you are leaving behind quite a lot of metadata in your old cluster and your consumers will become incapable to continue to work as before.