How to calculate service uptime/downtime duration in Grafana/Prometheus

57 views Asked by At

I'm working on a Grafana dashboard to monitor some of our services' uptime/downtime. The metric I used is called service_status which could be only 0 or 1 (0 means the service is down and 1 means service is up). The prometheus scrapping interval is set to 1m.

What I need is to calculate the service status duration for each service. if the service is currently up, it will show how long it has been up since last failure. Vice versa if the service is currently down. It will look like this:

Desired output

The way I can think of right now is to use idelta() function to find last time service_status changes. Then I can subtract its time with current time to find the duration based on current service_status via using a if else statement. But my solution does not returning data sometimes and I don't know what's wrong.

time() - max_over_time(timestamp(idelta(service_status{service ="serviceA"}[2m]) >0)[30d:1m]) and on() (service_status >0) 

or on() max_over_time(timestamp(idelta(service_status{service ="serviceA"}[2m])<0)[30d:1m]) - time()

Could someone tell me how I can calculate the duration for the service_status or help me with my promql query? Thanks a lot!

0

There are 0 answers