I'm new to Kinesis, so this might seem like a very basic question, but I have not been able to find a clear answer to what the actual difference is between a read and write transaction in a Kinesis stream.
Relevant parts from Amazon Kinesis Limits:
- GetShardIterator can provide up to 5 transactions per second per open shard.
- GetRecords can retrieve 10 MB of data.
- Each shard can support up to 5 transactions per second for reads, up to a maximum total data read rate of 2 MB per second.
- Each shard can support up to 1024 records per second for writes, up to a maximum total data write rate of 1 MB per second (including partition keys). This write limit applies to operations such as PutRecord and PutRecords.
It clearly mentions 5 reads and 1024 writes per second per shard. Why are reads so much more expensive than writes, or is there a crucial Kinesis concept here I haven't grasped?
Kinesis enables you to ingest granular data into a stream and read batches of records to process the information. So the volume of megabytes you can read per second is much more important than the number of read transactions you get per shard. For example, you might have a busy website generating thousand of views per minute and an EMR cluster to process your access logs. In this scenario, you will have much more write events than read events. The same is valid for clickstreams, financial transactions, social media feeds, IT logs, and location-tracking events, etc.