There is a DynamoDB table Entity
which has a hash key on id
and GSI on another attribute: cardId
. The GSI only has range key and does not have any sort key.
Whenever, we get a batch of create/update requests, we first use the GSI to read existing data and then write the main table, which also updates the GSI table eventually. During this time, we may also serve some parallel read requests from the GSI.
We are seeing an issue where the latency of both main table and GSI table increases from 200ms to 10-15 seconds during this time (batch writes + reads). I am not able to establish a co-relation between consecutive reads and writes in the table. The table is set to use on-demand capacity and there is no throttling. "SuccessfulRequestLatency" is ~300-400 ms only.
It is the DDB client method that has latency in seconds. It does not do any data transformation, just return the DB data as is to upper layers. Anything else that I should be monitoring to get to the root cause for this?
Thanks!
I don't have a full answer, but do have some directions you might want to investigate.
First, I noticed in the past is that extremely long latencies may indicate that your client gave up and retried the request. Some clients hide this retry, and it just looks like a very slow request from the outside.
Second, you're right that on-demand billing mode doesn't throttle based on provisioned throughput, but it nevertheless can do throttling - see https://aws.amazon.com/premiumsupport/knowledge-center/on-demand-table-throttling-dynamodb/. By default there are limits on the throughput that an on-demand table can have, as well as how quickly the throughput may grow. These limits are at least partially for your protection - you wouldn't want a run-away-train application to accidentally do billions of requests and cost you a million dollars :-)