I'm confused on how KCL works. First of all these are my understanding now.
- 1 KCL application uses one application name, creates one dynamodb table.
- 1 KCL application has one worker with x number of record-processor working parallel on x number of shards in a stream.
- The dynamodb table keeps track of owner, checkpoints and etc of each shards.
If i create multiple, let's say 3, KCL application with different application name, then they are basically different application reading from the same stream, isolate from each other by having separate dynamodb tables. All 3 of them will read all x number of shards in the stream and keep track of the checkpoints separately.
Based on a few docs that i read, for example: https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-scaling.html
I would assume if i create another KCL application with the same application name, there would be 2 KCL application working on the same stream, with shards being load balanced to 2 workers in the 2 apps.
So, technically i can create 8 KCL app(let says there are 8 shards in the stream) in 8 ec2 instances, and each of them will process exactly one shard without clash, since each of them checkpoint in its own row in the dynamodb table.
I thought that is the case, but this post suggest otherwise: Multiple different consumers of same Kinesis stream
Else how can i achieve this
All workers associated with this application name are assumed to be working together on the same stream. These workers may be distributed on multiple instances. If you run an additional instance of the same application code, but with a different application name, the KCL treats the second instance as an entirely separate application that is also operating on the same stream.
as mentioned here https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-implementation-app-java.html#kinesis-record-processor-initialization-java
Reference:
https://www.amazonaws.cn/en/kinesis/data-streams/faqs/#recordprocessor https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-scaling.html https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-implementation-app-java.html#kinesis-record-processor-initialization-java
KCL library needs ConfigsBuilder where you pass streamName, applicationName, kinesisAsyncClient etc. Here, if you specify an application name associated with stream name, then
So if you have multiple streams, then you create multiple software.amazon.kinesis.common.ConfigsBuilder with individual streamNames and its associated applicationNames. Pass individual configBuilder properties to software.amazon.kinesis.coordinator.Scheduler
This way you will have a dynamodb for every single streams. And your multi instance app can consume each stream event only once.