Does Kafka Streams GlobalKTable topic require the same number of partitions as KStream topic which it will be joining with?

1.2k views Asked by At

We want to use GlobalKTable in Kafka streams application. Input topics(KTable/KStream) have N partitions and a GlobalKTable will be used as a dictionary in the stream application.

Does the input topic for the GlobalKTable must have the same number of partitions as other input topics (which are sources of KTable/KStream)?

As I understand, the answer is NO(it is not limited and the topic may also have M partitions where N > M), because GlobalKTable is fully loaded in each instance of the stream application and the co-partitioning is not required during KStream join operation. But I need confirmation from the experts!

Thank you!

2

There are 2 answers

0
Bartosz Wardziński On BEST ANSWER

No, The number of partitions for topics for KStream and GlobalTable (that are join) can differ.

From Kafka Streams developer guide

At a high-level, KStream-GlobalKTable joins are very similar to KStream-KTable joins. However, global tables provide you with much more flexibility at the some expense when compared to partitioned tables:

  • They do not require data co-partitioning.

More details can be found here:

Global Table join

Join co-partitioning requirements

3
Ismael Idrissi On

More accurately:

Why is data co-partitioning required? Because KStream-KStream, KTable-KTable, and KStream-KTable joins are performed based on the keys of records (e.g., leftRecord.key == rightRecord.key), it is required that the input streams/tables of a join are co-partitioned by key.

The only exception are KStream-GlobalKTable joins. Here, co-partitioning is it not required because all partitions of the GlobalKTable‘s underlying changelog stream are made available to each KafkaStreams instance, i.e. each instance has a full copy of the changelog stream. Further, a KeyValueMapper allows for non-key based joins from the KStream to the GlobalKTable.