So, I have a strange situtation. I have an Azure Worker Role that runs a self-hosted OWIN API server. This worker role has 2 instances (ie: it's a farm). Communication to these instances is over SSL/443 and is round-robined by Azure behind their load balancer
I have hundreds of clients in the field, that communicate to this Worker Role every minute and are sending data to it. Every once in a while (say on a daily basis), one of my Worker Role instances reboots (let's call it instance "A"). Reboot takes 3-4 minutes, and during the time when reboot happens, all of my clients switch over to the instance that's up and running "B". This is good. What happens after the "A" instance is finished rebooting is not good. No new requests are coming to "A" instance, every client is "stuck" on the non-rebooted "B" instance. I end up having 2 load-balanced servers, one of which is running at 1% CPU and another at 60% CPU
I suspect that the issue is with the way my clients connect to my worker role. Here's the code. What am I doing wrong? Is WebRequest.Create() an issue? Do I need to switch to a different client? Which one?
TIA
private string PostData(BearerToken token, Guid accountId, string path, object input)
{
var url = _apiUrl + path;
try
{
var client = WebRequest.Create(url);
client.Headers.Add("Authorization", string.Format("Bearer {0}", token.AccessToken));
client.Headers.Add("AccountId", accountId.ToString());
client.Method = "POST";
client.Timeout = 120000;
client.ContentType = "application/json";
var data = JsonConvert.SerializeObject(input);
using (var stream = client.GetRequestStream())
using (var writer = new StreamWriter(stream))
{
writer.Write(data);
writer.Flush();
writer.Close();
using (var response = client.GetResponse())
{
using (var read = response.GetResponseStream())
{
if (read == null) return string.Empty;
using (var reader = new StreamReader(read))
{
return reader.ReadToEnd();
}
}
}
}
}
catch (Exception ex)
{
throw new Exception(string.Format("{0} ({1})", ex.Message, url), ex);
}
}
Is your load balancer distribution mode set to 2 or 3-tuple? That could explain the affinity.
https://learn.microsoft.com/en-us/azure/load-balancer/load-balancer-distribution-mode