DynamoDB ItemCount alternative

122 views Asked by At

I have a use case where I would be getting records from upstream for a particular batchID along with some meta data of the batch upfront. Example, I am told a batchID="ABC" will have 2000 records. After I start getting records in my service, I do some some processing and save it in DB with status = "PROCESSED". So my use case is, once I get all 2000 records for a batchID, I have to create a CSV file with all records (2000) in this batch and send it to some other service. Also, I update the status to "SENT".

Approach 1 (Naive): Run a query on composite GSI on batchID+status and check if count matches at every request. This will very expensive.

Approach 2: Use DynamoDB's atomic counter, where key = batchID and value is a count. At every DB insert, I make sure that count is incremented. I check the count and raise trigger if count matches to expectation. But in this case there would be cases of throttle and errors (i.e. if update fails).

Had it been SQL, I would have

SELECT COUNT(*) FROM records_table WHERE batchID = "ABC

I wanted to know if there's some hybrid approach in AWS that I can leverage to solve this use case.

1

There are 1 answers

0
webjaros On BEST ANSWER

I'd suggest using another table for batch indexing and processed record amount tracking. You could use DynamoDB stream to run lambda, which updates the amount in case of need (when the desired status is set). Also the very same lambda function would check if amount reached 2k and trigger another lambda function which does the sending. Below is more detailed architecture description.

DynamoDBDataTable

  • PK some data
  • GSI batchID
  • Data {status, ...someOtherData}

DynamoDBBatchIndexingTable

  • PK batchID
  • Data {amountOfProcessedItems, isSent}

Lambda1

  • Triggered by DynamoDBDataTable stream
  • If status of a record in the stream changed to "PROCESSED" it updates amountOfProcessedItems of the PK = batchId in DynamoDBBatchIndexingTable with +1
  • If amountOfProcessedItems is now 2000, triggers Lambda2.

Lambda2

  • Triggered by Lambda 1.
  • Gets all the records based on GSI on batchID
  • Creates CSV file and sends it to some other service. You will need at least 1GB ram lambda for this.
  • Updates DynamoDBBatchIndexingTable sets isSent = true
  • Updates all the records of DynamoDBDataTable with GSI = batchID with status="SENT". Maybe in your case just changing isSent is enough maybe not - I don't have enough detail about the context.