I need to create service availability metrics for a service fabric service. If all nodes are up then need to report OK state, any node is down then Warning state and multiple nodes are down then Error state. I am little confuse on getting the total number of nodes(UP+Down). I have written the below code and it's not giving any metrics as of now.
public async Task LogServiceFabricHealthMetrics(ServiceContext serviceContext, CancellationToken cancellationToken)
{
while (!cancellationToken.IsCancellationRequested)
{
var fabricClient = new FabricClient();
var nodeList = (await fabricClient.QueryManager.GetNodeListAsync()).ToList();
var serviceName = serviceContext.ServiceName.LocalPath.Split('/')[2];
var nodesRunningApplication = new List<Node>();
foreach (var node in nodeList)
{
var nodeApplicationList = await fabricClient.QueryManager.GetDeployedApplicationListAsync(node.NodeName);
var nodeApplication = nodeApplicationList.FirstOrDefault(p =>
p.ApplicationName.LocalPath.Split('/')[2] == serviceName);
if (nodeApplication != null)
{
nodesRunningApplication.Add(node);
}
}
if (nodesRunningApplication.Count == nodeList.Count)
{
//All nodes are up and report OK state
}
else
{
if (nodesRunningApplication.Count >= nodeList.Count - 2)
{
//One node is down and report Warning state
}
else
{
//More than one node is down and report Error state
}
}
await Task.Delay(this.interval * 6 * 60 * 1000);
}
}
Nodes
To Fetch all the running apps
For further information refer to this Blog and SO Link.