How to Store failed data on sending to topic in spring cloud stream kafka

284 views Asked by At

I have configured 3 brokers in Kafka running on different ports .I am using spring cloud stream kafka

brokers:  localhost:9092,localhost:9093,localhost:9094.

I am creating a data pipeline that gets continuous stream of data .I am storing stream of data in kafka topic with 3 brokers running .Till now there is no problem .My concern is suppose say 3 brokers went down for 5 minutes then at that time i am unable to get data on kafka topic .There will be data loss for 5 minutes .From spring boot i will get warning

2020-10-06 11:44:20.840  WARN 2906 --- [ad | producer-2] org.apache.kafka.clients.NetworkClient   : [Producer clientId=producer-2] Connection to node 0 (/192.168.1.78:9092) could not be established. Broker may not be available. 

Is there a way to store data temporary when all brokers goes down and again start to resume writing to topic from a temporary storage when brokers are up again ?

1

There are 1 answers

0
Michael Heil On

You could make use of the internal buffer the Producer is using to send the data to the cluster. The KafkaProducer has a queue under the covers and a dedicated I/O thread that actually sends the data to the cluster.

In combination with the producer configuration retries (by default set to 0) you may want to increase the buffer.memory which is described as

The total bytes of memory the producer can use to buffer records waiting to be sent to the server. If records are sent faster than they can be delivered to the server the producer will block for max.block.ms after which it will throw an exception.

This setting should correspond roughly to the total memory the producer will use, but is not a hard bound since not all memory the producer uses is used for buffering. Some additional memory will be used for compression (if compression is enabled) as well as for maintaining in-flight requests.

However, I do not think that having the producer itself dealing with a complete cluster failure is generally a good idea. Kafka itself is designed to deal with failures of individual brokers, but if all your broker go down uncontrolably at the same time you may run into bigger issues than just missing some data of an individual producer.

If only one broker is not reachable for a time period there is nothing to be done, as Kafka internally will switch the partition leader of the topic to another broker (if the partition was replicated of course).