Trouble with Prometheus metrics (Adapter and metricsQuery)

Question

Trouble with Prometheus metrics (Adapter and metricsQuery)

769 views Asked by Trarbish At 07 May 2023 at 16:37

Original problem. I would like to have a Kubernetes cluster with at least 2 nodes with zero GPU consumption. If a job is coming and takes one node, then autoscaler should create another spare node.

I found out that I can rely on DCGM_FI_DEV_GPU_UTIL metrics. If DCGM_FI_DEV_GPU_UTIL == 0 then the node is in "idle" mode. In PromQL I can just write count(DCGM_FI_DEV_GPU_UTIL == 0) and get the number of "idle" nodes.

However, I do not understand how to write metricsQuery in Prometheus Adapter config. All examples that I found are about

(sum(rate(<<.Series>>{<<.LabelMatchers>>}[1m])) by (<<.GroupBy>>)

However, I need something like count(<<.Series>> == 0), but this does not work. Any idea how I can get this metrics for HPA which indicates the number of nodes with no GPU consumption?

Original Q&A

There are 2 answers

Vitezslav Skacel On 07 May 2023 at 19:59

Probably your jobs are running in Kubernetes Pod. You may have a configuration where only one custom Pod with job can run on a single Node. The first step is to configure your metrics for the Prometheus adapter and it's described quite nicely here. This step will ensure that the Pod is added.

In the second step you need to configure a cluster autoscaler that will add another Node when needed. Cluster autoscaler is dependent on your Kubernetes solution provider (AWS, Azure, GCP...) and should be in their documentation. I personally use Cluster autoscaler, Karpenter.

**Trarbish** · Accepted Answer · 2023-05-10T14:23:56+00:00

Trarbish On 10 May 2023 at 14:23 BEST ANSWER

I ended up with KEDA with the prometheus trigger. It is easy to use and supports PromQL query. The only disadvantage that it is "average value" scaler, but it is not critical in my case.

TechQA.

Trouble with Prometheus metrics (Adapter and metricsQuery)

There are 2 answers

Related Questions in KUBERNETES

Related Questions in PROMETHEUS

Related Questions in AUTOSCALING

Related Questions in HORIZONTAL-POD-AUTOSCALING

Related Questions in PROMETHEUS-ADAPTER

Popular Questions

Trending Questions