How to implement parallelism in kafka using nodejs consumers?

1.9k views Asked by At

Theoratically speaking, since nodejs is single threaded how can I achieve parallelism when I define multiple consumers to increase throughput?

For eg, If I have a kafka topic that has 4 partitions, on the consumer end how would I be able to consume 4 messages in parallel when used with nodejs. At most I can acheive concurrency using the singe-threaded event loop.

One possible solution would be to fork child processes (in this case 3) so that each process can receive messages from a particular partition assuming the system has 3 idle cores. But how efficient/effective this approach would be?

What would be the best possible way to achieve this?

1

There are 1 answers

0
Giorgos Myrianthous On

In Kafka, partitions are the level of parallelism. Furthermore, the more partitions there are in a Kafka cluster, the higher the throughput one can achieve.

A Kafka topic is divided into a number of partitions which enables parallelism by splitting the data across multiple brokers. Multiple partitions enable multiple consumers to read from a topic in parallel. Therefore, in order to achieve parallel processing you need to partition your topic into more than one partitions.

In order to increase the number of partitions of an existing topic you can simply run

bin/kafka-topics.sh \
    --zookeeper localhost:2181 \
    --alter \
    --topic topicName \
    --partitions 40

This won't move existing data, though


Note on consumers, consumer groups and partitions
If you have N partitions, then you can have up to N consumers within the same consumer group each of which reading from a single partition. When you have less consumers than partitions, then some of the consumers will read from more than one partition. Also, if you have more consumers than partitions then some of the consumers will be inactive and will receive no messages at all.