prometheus metrics join doesn't work as i expected

868 views Asked by At

I have two prometheus metrics, kube_pod_info and kube_pod_container_status_restarts_total. And i need to enreach my telegram alert with data from both of them.

kube_pod_container_status_restarts_total{project="abc",env = "prod",namespace!="test"} returns {container: service-bridge-v0, deployconfig: service-bridge-v0-3, endpoint: https-main, env: prod, job: kube-state-metrics, mgroup: business, namespace: stowf-prod, origin_prometheus: Prometheus, pod: service-bridge-v0-8-fl4bq, project: abc, service: kube-state-metrics}

And kube_pod_info{project="abc",env="prod",namespace!="test"} returns {container: kube-abcd-proxy-main, created_by_kind: , created_by_name: , endpoint: https-main, env: prod, host_ip: 10.46.71.101, job: kube-state-metrics, mgroup: business, namespace: stowf-prod, node: sof-oc4m0w01.mycompany.org, origin_prometheus: Prometheus, pod: advertising-v1-4-deploy, pod_ip: 10.241.17.170, project: abc, service: kube-state-metrics, uid: 14e88aae-b3fb-4dd5-a77f-565725046489}

But, i need output like this: {deployconfig: service-bridge-v0-8, env: prod, instance: service-bridge-v0-8-fl4bq, node: sof-oc4m0w02.mycompany.org, pod: service-bridge-v0-8-fl4bq, project: abc}

I have tried "* on(pod)" or "* on (any of the same label in the both metrics)", but getting only "duplicate time series on the left side of * on (pod)" error Example query: kube_pod_info { project = "abc", env = "prod", namespace != "test" } * on(pod) kube_pod_container_status_restarts_total { project = "abc", env = "prod", namespace != "test" } cannot execute query: cannot evaluate "kube_pod_info{project="abc", env="prod", namespace!="test"} * on (pod) kube_pod_container_status_restarts_total{project="abc", env="prod", namespace!="test"}": duplicate time series on the left side of * on (pod)

Then i tried "ignoring" labels, but got "no more datat to show" Example query: kube_pod_info { project = "abc", env = "prod", namespace != "test" } / ignoring(deployconfig,created_by_kind,created_by_name,host_ip,pod_ip,uid,node) kube_pod_container_status_restarts_total { project = "abc", env = "prod", namespace != "test" }

Grouping "many to many" works fine, but i need to cut off unnecessary labels, when i get message i don't need them. Example query: kube_pod_info * on(uid) group_left(instance) (rate ( kube_pod_container_status_restarts_total { project = "abc", env = "prod", namespace != "test" } [10m] ) * 600) Answer: {container: kube-abcd-proxy-main, created_by_kind: ReplicationController, created_by_name: service-bridge-v0-8, deployconfig: service-bridge-v0-8, endpoint: https-main, env: prod, host_ip: 10.46.71.101, instance: service-bridge-v0-8-fl4bq, job: kube-state-metrics, mgroup: business, namespace: stowf-prod, node: sof-oc4m0w02.mycompany.org, origin_prometheus: Prometheus, pod: service-bridge-v0-8-fl4bq, pod_ip: 10.241.17.170, project: abc, service: kube-state-metrics, uid: 0680c9d5-5364-4509-90d5-c8d7f21ac352}

So, i need help to "ignore" unusble labels, or cut off them from the last query. Thank you!

1

There are 1 answers

0
hagen1778 On BEST ANSWER

The on operator basically matches left and right pairs based on the list of provided labels. But if there are more than 1 series with identical label-value pairs from the list - you get an error about duplicates. Consider the example:

kube_pod_container_status_restarts_total{namespace="ns", pod="pod", service="service1"}
kube_pod_container_status_restarts_total{namespace="ns", pod="pod", service="service2"}
kube_pod_labels{namespace="ns", pod="pod", service="service1"}
kube_pod_labels{namespace="ns", pod="pod", service="service2"}

If you run the following query:

kube_pod_container_status_restarts_total * on(pod, namespace) kube_pod_labels

you'll get an error about duplicates because namespace and pod pairs will match two different time series on both sides transforming the data into the following:

left:
{namespace="ns", pod="pod"}
{namespace="ns", pod="pod"}
right:
{namespace="ns", pod="pod"}
{namespace="ns", pod="pod"}

To make it work, you need to specify additional label in on: on(namespace, pod, service) which would result in the following time series set:

left:
{namespace="ns", pod="pod", service="service1"}
{namespace="ns", pod="pod", service="service2"}
right:
{namespace="ns", pod="pod", service="service1"}
{namespace="ns", pod="pod", service="service2"}

Please note, label service was picked only for example. It is very likely you have a different label or multiple labels which make the output unique.