Can't reconnect to Azure Redis via StackExchange.Redis

3.2k views Asked by At

Caveat: Okay so this is a weird one, and i'm not sure if SO is the right place.

I have an Azure Website connecting to an Azure Redis Cache instance. (using StackExchange.Redis)

Everything was great, then one day - the website couln't connect to Redis.

Error:

It was not possible to connect to the redis server(s); to create a disconnected multiplexer, disable AbortOnConnectFail. SocketFailure on PING

Here's my connection string:

mycache.redis.cache.windows.net,ssl=true,password=xxxxxx,syncTimeout=5000

Here were my diagnosis steps:

  1. Try and connect from local to Azure Redis. Result: SUCCESS (so code is good?)
  2. Try and spinup NEW Azure Redis instance, connect from Azure. Result: FAIL (website can't connect to ANY azure Redis instance?)
  3. Spinup NEW Azure Website, with same code as erroring code, pointing to existing Redis cache. Result: SUCCESS (um, what?)
  4. File new MVC website, add StackExchange.Redis, deploy to new Azure Website, connecting to Redis. Result: SUCCESS (so Redis is good?)
  5. Deploy above vanilla MVC website to existing Azure Website (so same code as 4, connecting to same Redis, only difference is it's using the old Azure Website physical machine/networking). Result: FAIL (wtf??)

So - i'm thinking Redis has "blacklisted" the Azure website? (is that even possible?) I know that the client (my code) won't try and keep reconnecting, but i've bounced the site many times, and it just can't reconnect to Redis.

The fact that spinning up a new Azure Website, with the same code connecting to the same Redis instance results in success, tells me that some kind of blacklisting/routing issue has occured in Azure/Redis.

Any ideas?

EDIT

Looks like the problem is Azure VNET. When my website is part of the Azure Virtual Network, it can't connect to Redis. But when i take it out of the network, it connects fine. Before today, this setup was working fine.

So im wondering if Azure has made a change so that websites in a VNET cannot connect to Azure Redis? (makes no sense i know)

EDIT 2:

Attached is the logs from the Redis connection attempt.

Exception: It was not possible to connect to the redis server(s); to create a disconnected multiplexer, disable AbortOnConnectFail. SocketFailure on PING connection-string-removed:6380,password=password-removed,ssl=True Connecting connection-string-removed:6380/Interactive... BeginConnect: connection-string-removed:6380 1 unique nodes specified Requesting tie-break from connection-string-removed:6380

__Booksleeve_TieBreak... Allowing endpoints 00:00:05 to respond... Awaiting task completion, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=4,Free=32763,Min=1,Max=32767) Not all tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=5,Free=32762,Min=1,Max=32767) connection-string-removed:6380 did not respond Awaiting task completion, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=5,Free=32762,Min=1,Max=32767) Not all tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=5,Free=32762,Min=1,Max=32767) connection-string-removed:6380 failed to nominate (WaitingForActivation) No masters detected connection-string-removed:6380: Standalone v2.0.0, master; keep-alive: 00:01:00; int: Connecting; sub: Connecting; not in use: DidNotRespond connection-string-removed:6380: int ops=0, qu=2, qs=0, qc=0, wr=0, socks=1; sub ops=0, qu=0, qs=0, qc=0, wr=0, socks=1 Circular op-count snapshot; int: 0 (0.00 ops/s; spans 10s); sub: 0 (0.00 ops/s; spans 10s) Sync timeouts: 0; fire and forget: 0; last heartbeat: -1s ago resetting failing connections to retry... retrying; attempts left: 2... 1 unique nodes specified Requesting tie-break from connection-string-removed:6380 > __Booksleeve_TieBreak... Allowing endpoints 00:00:05 to respond... Awaiting task completion, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=6,Free=32761,Min=1,Max=32767) Not all tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=7,Free=32760,Min=1,Max=32767) connection-string-removed:6380 did not respond Awaiting task completion, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=7,Free=32760,Min=1,Max=32767) Not all tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=7,Free=32760,Min=1,Max=32767) connection-string-removed:6380 failed to nominate (WaitingForActivation) No masters detected connection-string-removed:6380: Standalone v2.0.0, master; keep-alive: 00:01:00; int: Connecting; sub: Connecting; not in use: DidNotRespond connection-string-removed:6380: int ops=0, qu=2, qs=0, qc=0, wr=0, async=3, socks=2; sub ops=0, qu=0, qs=0, qc=0, wr=0, socks=2 Circular op-count snapshot; int: 0 (0.00 ops/s; spans 10s); sub: 0 (0.00 ops/s; spans 10s) Sync timeouts: 0; fire and forget: 0; last heartbeat: -1s ago resetting failing connections to retry... retrying; attempts left: 1... 1 unique nodes specified Requesting tie-break from connection-string-removed:6380 > __Booksleeve_TieBreak... Allowing endpoints 00:00:05 to respond... Awaiting task completion, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=8,Free=32759,Min=1,Max=32767) EndConnect: connection-string-removed:6380 (socket shutdown) Connect complete: connection-string-removed:6380 All tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=11,Free=32756,Min=1,Max=32767) connection-string-removed:6380 faulted: SocketFailure on PING Awaiting task completion, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=11,Free=32756,Min=1,Max=32767) Not all tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=7,Free=32760,Min=1,Max=32767) connection-string-removed:6380 failed to nominate (WaitingForActivation) No masters detected connection-string-removed:6380: Standalone v2.0.0, master; keep-alive: 00:01:00; int: Connecting; sub: Connecting; not in use: DidNotRespond connection-string-removed:6380: int ops=0, qu=2, qs=0, qc=0, wr=0, async=7, socks=3; sub ops=0, qu=0, qs=0, qc=0, wr=0, socks=3 Circular op-count snapshot; int: 0 (0.00 ops/s; spans 10s); sub: 0 (0.00 ops/s; spans 10s) Sync timeouts: 0; fire and forget: 0; last heartbeat: -1s ago

Can anyone decipher this?

3

There are 3 answers

2
Aleks B On BEST ANSWER

I'm with the Azure Web Apps team - it looks like your VNET got into a particularly strange state, and was interrupting network connectivity for your app. I have fixed this behavior.

We are incredibly sorry for the inconvenience...

2
mikkark On

We might be experiencing the same issue. I was able to put a test app onto a web app in Azure without a virtual network and it works right out-of-the-box (using Stackexchange.Redis). When I put the same code onto a web app that's part of a virtual network it doesn't work.

I managed to fix the first error ("It was not possible to connect to the redis server(s)..") by setting the AbortOnConnectFail to false. Then I got the error "No connection is available to service this operation: EXISTS foo" (= in this case trying to check if key 'foo' exists).

I am able to fix that too by setting ConnectTimeout to 10 seconds. So basically, I can get it to work, but that seems to cause long delays when (I assume) SE.Redis looses connection and tries to reconnect.

0
Nigrimmist On

If aboves posts did not help, you can check :

  • double check your password (private key) for correctness
  • try to check/uncheck "32-bit prefer" in project properties if you app is executable.
  • try to off ssl on from Azure redis side (from ui) and try set useSsl to false.
  • download source code (from there : https://github.com/StackExchange/StackExchange.Redis) and try to debug inner issue.

Part of that manipulation helped me.