I have a time-series:
sum(ALERTS{alertname="IngestionStopped", alertstate="firing"} unless on(table) (ALERTS{alertname="MyAlert1",alertstate="firing"} OR ALERTS{alertname="MyAlert2",alertstate="firing"}) OR vector(0))
I do a sum because I have 1 TS for each partition of the table. I am interested even if a single partition has its ingestion stopped.
This TS = 0 when my service. is working fine. If it's > 0, it means there's something wrong with the server. I want to calculate the % of time my service was not working fine (meaning this TS was > 0). How can I do that?

For any query that produces continuous output of value 0 or 1 you can count an average over time using function of
avg_over_time, like this:Where
rangeis time range over which you want to calculate average, andresolutionis how often sample of your query should be evaluated within range.resolutioncan also be omitted (without omitting:). In that case global evaluation interval (evaluation_intervalfrom config, by default1m) will be used as a default value.Since your query produces values other then 1, that for intents of this exercise should be treated as 1, it can be modified by adding
> bool 0. It uses boolean comparison to convert all values that satisfy the condition into 1.So final query would be
Adjust resolution according to your situation, but remember that alert rules are evaluated (and subsequently metric
ALERTSupdated) only once everyevaluation_interval, so no need to go crazy low there.Demo of similar query can be seen here.