Kubernetes Horizontal Pod Autoscaler with Prometheus Metrics (prometheus-adapter)

546 views Asked by At

I have this use case:

When there is much load on a specific queue in RabbitMQ, i want to start more replicas. Let's say, my app can handle 5 messages ( = tasks) simultaneously and they all take 1 min to complete. When there are more than 10 "ready" messages in the rabbitmq Queue, i want the HPA to start a new replica. When there are 20 "ready" messages, start 2, at 30 "ready" messages start 3 etc.

I used this helm chart for install prometheus-adapter:

https://github.com/prometheus-community/helm-charts/tree/main/charts/prometheus-adapter

And in my helm values yaml i added:

rules:
  default: true

  custom:
    - seriesQuery: '{__name__=~"rabbitmq_detailed_queue_messages_ready"}'
      name:
        matches: "^(.*)"
        as: "open_tasks"
      resources:
        overrides:
          kubernetes_namespace: { resource: "namespace" }
          kubernetes_name: { resource: "service" }
      metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>,queue="my-task-queue"}) by (<<.GroupBy>>)

Now, this should be exposed, but it isn't:

$ kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1 | jq . | grep open_tasks

Now this is my main question. After that i could deploy an HPA for my app task-processor like this:

kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2
metadata:
  name: task-processor-autoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: task-processor
  minReplicas: 1
  maxReplicas: 10
  metrics:
    - type: Object
      object:
        metric:
          name: open_tasks
        describedObject:
          apiVersion: "/v1"
          kind: Service
          name: open_tasks
        target:
          type: Value
          value: 10

Now my questions:

  1. Why is the metric not exposed in the raw query?
  2. Is my yaml for the HPA correct? I have the feeling that i'm missing some essential stuff here, but i'm not able to get my head around that.
2

There are 2 answers

1
Dion V On
It might be that Kubernetes does not understand custom metrics out of the box, making it not exposed.

 

You can check Santhosh Nagaraj article Scaling Celery workers with RabbitMQ on Kubernetes that can provide you insights about your use case using KEDA.

0
mhadidg On

Your use case is essentially autoscaling based on RabbitMQ queue length, which is a bit challenging due to the limitations of the native Kubernetes HPA controller, as it "typically" autoscale based on CPU and memory usage.

Alternatively, you may consider utilizing KEDA (Kubernetes Event-Driven Autoscaling):

KEDA is a Kubernetes-based event-driven autoscaler. With KEDA, you can drive the scaling of any deployment based on the various criteria (depends on the scaler adapter). It supports RabbitMQ, which makes it particularly suitable for your case.

Here's a sample ScaledObject for your case:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
spec:
  scaleTargetRef:
    apiVersion: apps/v1 # Optional (default: apps/v1)
    kind: Deployment
    name: task-processor
  triggers:
  - type: rabbitmq
    metadata:
      queueName: 'my-task-queue'
      mode: QueueLength # Trigger on number of messages in the queue.
      value: '5'  # Target number of tasks per pod.