Read and write transactions in Amazon Kinesis

4.5k views Asked by At

I'm new to Kinesis, so this might seem like a very basic question, but I have not been able to find a clear answer to what the actual difference is between a read and write transaction in a Kinesis stream.

Relevant parts from Amazon Kinesis Limits:

  • GetShardIterator can provide up to 5 transactions per second per open shard.
  • GetRecords can retrieve 10 MB of data.
  • Each shard can support up to 5 transactions per second for reads, up to a maximum total data read rate of 2 MB per second.
  • Each shard can support up to 1024 records per second for writes, up to a maximum total data write rate of 1 MB per second (including partition keys). This write limit applies to operations such as PutRecord and PutRecords.

It clearly mentions 5 reads and 1024 writes per second per shard. Why are reads so much more expensive than writes, or is there a crucial Kinesis concept here I haven't grasped?

2

There are 2 answers

1
Uilton Dutra On BEST ANSWER

Kinesis enables you to ingest granular data into a stream and read batches of records to process the information. So the volume of megabytes you can read per second is much more important than the number of read transactions you get per shard. For example, you might have a busy website generating thousand of views per minute and an EMR cluster to process your access logs. In this scenario, you will have much more write events than read events. The same is valid for clickstreams, financial transactions, social media feeds, IT logs, and location-tracking events, etc.

0
Guy On

The common use case is that multiple producers are writing their events to Kinesis. For example multiple web servers, multiple browsers or multiple mobile devices. Each producer can write multiple events, either one by one or in a batch of up to 500 events.

On the other hand the consumers of the events are a small number of processes. The simple use case is that a "slow" reader is reading batches of events from the kinesis stream (for example, 10,000 events every 10 seconds) and writing them to S3 as a single log file.

In such a case you are writing thousands of events (mostly one by one), but you are reading only once per second (or 10 seconds in the example above) all the events that were added to the stream in this period of time. Therefore, the ratio of writes to reads is 1024:1.

In most cases there are a small number of consumers from the kinesis stream and not a single reader. For example, on top of the "slow" reader above, you can have a "fast" reader that is scanning the incoming events and filtering them or summarizing their values, to be able to react in real time. This fast reader can identify fraud transactions and block them, or calculate real time counters for operational dashboards.

Still the number of reads will be small, relatively to the number of writes. In such a case, the "fast" reader will read every 1/4 second to allow near real time reaction to the events. Therefore, the ratio of writes to reads will be 1024:5 (=1+4)