My spark consumption Kafka has data accumulation problems. After troubleshooting, I found that data consumption is very time-consuming once about 10 minutes. I located the specific code here.
while (consumerRecords.hasNext()) {
long begin = System.currentTimeMillis();
ConsumerRecord<String, Message> consumerRecord = consumerRecords.next();
long next = System.currentTimeMillis() - begin ;
....
The type of consumerRecords object is KafkaRDD, and the next() method took about 40 seconds to return the data. This caused data accumulation.
This is my information from monitoring
2020/10/19 18:03:44.000 7 records 40 min 0.4 s 40 min
2020/10/19 18:03:43.500 2 records 40 min 0.4 s 40 min
2020/10/19 18:03:43.000 7 records 39 min 40 s 40 min
2020/10/19 18:03:42.500 2 records 39 min 0.4 s 39 min
2020/10/19 18:03:42.000 8 records 39 min 0.4 s 39 min
I don’t know how to continue to troubleshoot this problem, or what causes it to be so time-consuming?
Please give me some guidance and suggestions, thank you