Aws dax stability issues

1.2k views Asked by At

I am attempting to introduce DAX to our architecture but so far with no success. Connection to dax happenns through lambdas and the setup done is like the examples in AWS documentation. Lambda and Dax are in the same vpc, they can see each other most of the time and dax is returning responses. Dax also has 8111 port open.

However, after running our regression tests a few times there are errors that starts popping out in cloudwatch. The most frequent ones are:

  • "Failed to pull from [daxurlhere] (10.0.1.177,10.0.1.25,10.0.2.11): TimeoutError: Connection timeout after 10000ms"
  • Error: NoRouteException: not able to resolve address: [{"host":"[daxurlhere]","port":8111}]
  • ERROR caught exception during cluster refresh: DaxClientError: NoRouteException: not able to resolve address:[{"host":"[daxurlhere]","port":8111}]
  • ERROR Failed to resolve [daxurl]: Error: queryA ECONNREFUSED [daxurl]

When those errors happen they are breaking a few of our regression tests. Funny thing is that they are not persistent and it is very hard to track the issue.

Any suggestions would be more than welcome!

2

There are 2 answers

0
ramamoorthy_villi On

Seems your configuration is fine. Check the below steps:

1. Make sure you are not strongly consistently reading

From the AWS doc:

DAX can't serve strongly consistent reads by itself because it's not tightly coupled to DynamoDB. For this reason, any subsequent reads from DAX would have to be eventually consistent reads

see this code results strongly consistent read and make the connection unstable

const parameters = {
      TableName: 'Travels',     
      ConsistentRead: false,
      ExpressionAttributeNames: {
        '#createdAt': 'createdAt',        
      },
      ExpressionAttributeValues: {
        ':createdAt': Date.now(),  -----> Look at this       
      },
      KeyConditionExpression: '#createdAt >= :createdAt',          
    };

    
    const endpoint = DAX_CLUSTER_ENDPOINT;
    const daxService = new AmazonDaxClient({ endpoints: [endpoint], region });
    const daxClient = new AWS.DynamoDB.DocumentClient({ service: daxService });
    response = await daxClient.query(parameters).promise(); 

Date.now() wouldn't generate same value everytime. If a request does not exactly match a previous request, it won't be a cache hit. check the parameters on your large requests like limit, projection expression, exclusive start key;

2. Check the Clusters Monitor - Cloudwatch query/scan cache hit,the clusters cacheing the data.

3. Other helpful links:

0
BillMan On

Be aware the although the DAX distributes reads among the nodes in the clusters for reads, all the writes happen though the master node. We have seen cascading failover of nodes during write intensive periods. The master node gets overwhelmed, reboots, and another node now becomes master, reboots, etc.