I have a latency sensitive application which uses Lettuce client to connect to an AWS ElastiCache Redis cluster. If there is a cache miss, it goes to ElasticSearch for getting the required data.
I am trying out a failover scenario where I delete the redis cluster and bring it back to check where the redis failure is detected and whther it reconnects once it is brought back. I have set up two timeouts:
- socket timeout = 500ms
- command timeout = 100ms
During the time redis is down, instead of returning a connection failure immediately, it is waiting for a very long time >3s to get a failure. Ideally I would expect it to detect the failure immediately and go to the original DB for the required data. This is defeating the purpose of having a cache. It is slowing the service down beyond the acceptable range.
Once the redis cluster is up and running, it reconnects shortly.
I tried setting up the timeouts and expected those to kick in and detect the failure within an acceptable time.