Running Kafka cost-effectively at the expense of lower resilience

267 views Asked by At

Let's say I have a cheap and less reliable datacenter A, and an expensive and more reliable datacenter B. I want to run Kafka in the most cost-effective way, even if that means risking data loss and/or downtime. I can run any number of brokers in either datacenter, but remember that costs need to be as low as possible.

For this scenario, assume that no costs are incurred if brokers are not running. Also assume that producers/consumers run completely reliably with no concern for their cost.

Two ideas I have are as follows:

  1. Provision two completely separate Kafka clusters, one in each datacenter, but keep the cluster in the more expensive datacenter (B) powered off. Upon detecting an outage in A, power on the cluster in B. Producers/consumers will have logic to switch between clusters.
  2. Run the Zookeeper cluster in B, with powered on brokers in A, and powered off brokers in B. If there is an outage in A, then brokers in B come online to pick up where A left off.

Option 1 would be cheaper, but requires more complexity in the producers/consumers. Option 2 would be more expensive, but requires less complexity in the producers/consumers.

Is Option 2 even possible? If there is an outage in A, is there any way to have brokers in B come online, get elected as leaders for the topics and have the producers/consumers seamlessly start sending to them? Again, data loss is okay and so is switchover downtime. But whatever option needs to not require manual intervention.

Is there any other approach that I can consider?

2

There are 2 answers

0
OneCricketeer On

Neither is feasible.

Topics and their records are unique to each cluster. Only one leader partition can exist for any Kafka partition in a cluster.

With these two pieces of information, example scenarios include:

  • Producers cut over to a new cluster, and find the new leaders until old cluster comes back
  • Even if above could happen instantaneously, or with minimal retries, consumers then are responsible for reading from where? They cannot aggregate data from more than one bootstrap.servers at any time.
  • So, now you get into a situation where both clusters always need to be available, with N consumer threads for N partitions existing in the other cluster, and M threads for the original cluster
  • Meanwhile, producers are back to writing to the appropriate (cheaper) cluster, so data will potentially be out of order since you have no control which consumer threads process what data first.
  • Only after you track the consumer lag from the more expensive cluster consumers will you be able to reasonably stop those threads and shut down that cluster upon reaching zero lag across all consumers

Another thing to keep in mind is that topic creation/update/delete events aren't automatically synced across clusters, so Kafka Streams apps, especially, will all be unable to maintain state with this approach.


You can use tools like MirrorMaker or Confluent Replicator / Cluster Linking to help with all this, but the client failover piece I've personally never seen handled very well, especially when record order and idempotent processing matters


Ultimately, this is what availability zones are for. From what I understand, the chances of a cloud provider losing more than one availability zone at a time is extremely rare. So, you'd setup one Kafka cluster across 3 or more availability zones, and configure "rack awareness" for Kafka to account for its installation locations.

2
Chris Larsen On

If you want to keep the target / passive cluster shutdown while not operational and then spin up the cluster you should be ok if you don't need any history and don't care about the consumer lag gap in the source cluster.. obv use case dependent.

MM2 or any sort of async directional replication requires the cluster to be active all the time.

Stretch cluster is not really doable b/c of the 2 dc thing, whether raft or zk you need a 3rd dc for that, and that would probably be your most expensive option.

Redpanda has the capability of offloading all of your log segments to s3 and then indexes them to allow them to be used for other clusters, so if you constantly wrote one copy of your log segments to your standby DC storage array with s3 interface it might be palatable. Then whenever needed you just spin up a cluster on demand in the target dc and point it to the object store and you can immediately start producing and consuming with your new clients.