How to tackle eventual consistency issues on AWS

1.3k views Asked by At

I'm working on a project to manage documents (eg: create, read, maintain different versions etc...) and my plan is to use the following AWS architecture.

enter image description here

When a document is created/updated it will be saved on to a version enabled s3 bucket via API Gateway S3 proxy. S3 put event will trigger a lambda to get latest version and all version ids and save it to DynamoDB. Once it is saved on a DynamoDB table, it will be indexed in Elasticsearch via DynamoDB stream.

My Plan is to use Elasticsearch for all search queries. And I will load the latest documents from DynamoDB. Since each record has S3 version ids i can query old versions from S3 as well.

Since my architecture relies much on eventual consistency i.e. (S3 to DynamoDB and DynamoDB to Elastic Search) I'm worried that I would not get the latest document data either when I query the Elasticsearch or query DynamoDB after I create a document.

Any suggestions for improvements will be much appreciated.

Thanks!

2

There are 2 answers

0
jlaitio On BEST ANSWER

As you said your application architecture has multiple points where eventual consistency is used.

If your application business case absolutely requires that when you query data, you get the absolute latest version, then your architecture choices are bad and you should, for example, consider using a RDS persistence instead.

If not, then you just design the rest of your system keeping in mind that getting a completed PUT does not guarantee that queries immediately return the data. Giving instructions on how to do this vastly depends on your application and cannot feasibly be generalized.

0
gkatzioura On

Since you use a dynamodb stream, your dynamodb insert will reach your elastic search server but with a delay. In case of write failure it's up to the client to issue a retry. Also you have to keep in mind the time it takes to trigger a dynamodb stream and the time it takes for the elastic search indexing (Plus the s3 event).

So your problem has to do more with the time it takes to reach the elastic search server.

If you want something more consistent that depicts the current status (since that is the problem you will end up with) without any delays you need to change the tools.