I understand that Kafka can pull events in batches. I am trying to understand this scenario:
- I have 4 partitions for a topic
- I have 1 consumer which Kafka assigns all 4 partitions to it.
- let's assume every batch Kafka client pulls from Kafka is 5 messages.
What I'm trying to understand here is if events in 1 batch are all from the same partition, and then round-robin to the next partition batch. or does the batch itself already contains events from different partitions?
I can't give you a precise answer but found it interesting enough to test it out.
For this, I have created a topic with four partitions and used the
kafka-producer-perf-test
command line tool to produce some messages into the topic. As the performance test tool will not create any keys at all, the messages are written into the topic partitions in round-robin.Afterwards, I have created a simple KafkaConsumer using the configuration
max_poll_records=5
to match your question. The consumer simply prints out the offset and partition of each message consumed:The result, answering your question is, that the consumer tries to fetch as many data from one partition before it moves on to the other. Only in the scenario where all messages from partition
1
were consumed but the limit of max_poll_records of 5 was not reached yet it added two more messages from partition2
.Here are some of the prints to get a better understanding.