I have a system where 300+ IoT devices are sending data every 10 seconds. The average message size is less than 1 KB. I'm using Kinesis Data Streams for data ingestion and AWS Lambda for processing. I have provisioned 2 shards in Kinesis. I need to find the optimal batch size and window size to keep the system real-time.
Here are the details of my setup:
- Number of IoT devices: 300+
- Data generation frequency: 10 seconds
- Average data size: < 1 KB per message
- Data ingestion platform: Kinesis Data Streams
- Number of Kinesis shards: 2
- Processing platform: AWS Lambda
One formula I came across to calculate the batch size is this:-
batch_size = desired_latency / data_generation_frequency
Is the above mentioned formula correct for calculating the optimal batch size?
Additionally, is there a similar formula for calculating the optimal window size?
Questions:
- What is the optimal batch size for processing data in Lambda to maintain real-time performance?
- What is the optimal window size for batching data in Kinesis to ensure efficient processing?
- Should I consider using dedicated consumer throughput or Enhanced Fan-out (EFO) for higher throughput?
- What are some additional factors to consider when optimizing data batching for real-time processing?
I'm open to any suggestions or best practices for optimizing my Kinesis and Lambda configuration for real-time processing.