Reprocess batches of items over and over again - and the batch might change any time

62 views Asked by At

I am just looking for ideas on how to solve one specific thing I'd like to build.

Say I have two sets of items. Each item is just a couple of lines of JSON. Any time an item is added to one set I immediately (well, almost) want to process this against the full other set. So item is added to set A: Process against each item in set B. And vice versa.

Items come in through API Gateway + Lambda. Match processing in Lambda from a queue/stream.

What AWS technology would be a good fit? I have no idea and no clear pattern on when or how often the sets change. Also, I want it to be as strongly consistent as possible. And of course, I want it to be as serverless and cost-effective as possible. :)

Options could be:

  • sets stored in Aurora, match processing for a new item in A would need to query the full set B from the database each time
  • sets stored in DynamoDB, maybe with DynamoDB stream in the background; match processing for a new item in A would need to query the full set B from Dynamo; but spiky load, not a good fit because of unclear read/write provisioning
  • have each set in its own "static" Kinesis stream where match processing reads through items but doesn't trim. Streams to be replaced with fresh sets regularly

My pain point is: While processing items from A there might be thousands of items in B to be matched. And I want to avoid having to load the full set B from some database every time I process an item from A. I was thinking about some caching of sets but then would need a good option to invalidate that cache whenever something changes.

0

There are 0 answers