Prometheus query by label with range vectors

1.5k views Asked by At

I'm defining a lot of counters in my app (using java micrometer) and in order to trigger alerts I tag the counters which I want to monitor with "error":"alert" so a query like {error="alert"} will generate multiple range vectors:

   error_counter_component1{error="alert", label2="random"}
   error_counter_component2{error="alert", label2="random2"}
   error_counter_component3{error="none", label2="random3"}

I don't control the name of the counters I can only add the label to the counters I want to use in my alert. The alert that I want to have is if all the counters labeled with error="alert" increase more then 3 in one hour so I could use this kind of query: increase({error="alert"}[1h]) > 3 but I get the fallowing error in Prometheus: Error executing query: vector cannot contain metrics with the same labelset

Is there a way to merge two range vectors or should I include some kind of tag in the name of the counter? Or should I have a single counter for errors and the tags should specify the source something like this:

errors_counter{source="component1", use_in_alert="yes"}
errors_counter{source="component2", use_in_alerts="yes"}
errors_counter{source="component3", use_in_alerts="no"}
1

There are 1 answers

1
bjakubski On

The version with source="componentX" label is much more fitting to prometheus data model. This is assuming the error_counter metric is really one metric and other than source label value it will have same labels etc. (for example it is emitted by the same library or framework).

Adding stuff like use_in_alerts label is not a great solution. Such label does not identify time series. I'd say put a list of components to alert on somewhere where your alerting queries are constructed and dynamically create separate alerting rules (without adding such label to raw data). Other solution is to have a separate pseudo metric that will obnly be used to provide metadata about components, like:

   component_alert_on{source="component2"} 1

and. combine it in alerting rule to only alert on components you need. It can be generated in any possible way, but one possibility is to have it added in static recording rule. This has the con of complicating alerting query somehow. But of course use_in_alerts label will also probably work (at least while you are only alerting on this metric).