Send kubernetes(GKE) service layer metrics to GCP Load Balancer

368 views Asked by At

I am using GKE and have an application-app1(pod) which is exposed using NodePort and then put behind an ingress.

The ingress-controller has launched a GCP load balancer. Now, the requests coming on path /app1/ are routed to my application.

I launched the stackdriver-metrics adapter inside the cluster and then I configured an HPA which uses requests/second metrics from the load balancer. HPA gets the metrics from ExternalMetric for a particular backend name.

  - external:
      metricName: loadbalancing.googleapis.com|https|request_count
      metricSelector:
        matchLabels:
          resource.labels.backend_target_name: k8s-be-30048--my-backend
      targetAverageValue: 20
    type: External

Everything works perfectly. Here is the problem,

Some of the other apps which are also running inside the kubernetes cluster are also calling this app1. Those other apps inside the cluster are calling the app1 by the kubernetes FQDN app1.default.svc.cluster.local and not via the load balancer route. That means these requests won't go throught the ingress loadbalancer. That will mean that these requests are not being counted by the HPA in any way.

So, that menans the total requests(Ct) coming are via LoadBalancer(C1) and via FQDN(C2), Ct = C1 + C2. My guess is that hpa will only take C1 into account and not Ct. My hpa will not scale my app accordingly because of the way metrics are being counted here. For example, if Ct is 120 but C1 is 90 then number of pods will be 3 but it should acutally be 4.

Am I wrong here to consider that requests coming via FQDN are not counted by the load balancer?

If the requests are being counted I think I will have to use something which counts requests on the pod level. Something like a prometheus middleware. Can you guys suggest anything else?

1

There are 1 answers

0
Dawid Kruk On

Answering following comment:

Yup, that's the obstruction. No way to forecast/relate the kind of traffic. Anyway, how would it help if it could be forecasted?

If it could be forecasted (for example it's always 70%(external)/30%(internal) you could adjust the scaling metric to already include the traffic that the loadbalancer metric isn't aware of.


Instead of collecting metrics from the load balancer itself which will not take into consideration the internal traffic, you can opt to use Custom Metrics (for example: queries per second).

Your application can report a custom metric to Cloud Monitoring. You can configure Kubernetes to respond to these metrics and scale your workload automatically. For example, you can scale your application based on metrics such as queries per second, writes per second, network performance, latency when communicating with a different application, or other metrics that make sense for your workload. A custom metric can be selected for any of the following:

  • A particular node, Pod, or any Kubernetes object of any kind, including a CustomResourceDefinition (CRD).
  • The average value for a metric reported by all Pods in a Deployment

-- Cloud.google.com: Kubernetes Engine: Custom and external metrics: Custom metrics

There is an official documentation about creating Custom Metrics:

You can also look on already available metrics in the Metrics Explorer.


You can also use multiple metrics when scaling up/down with HPA:

If you configure a workload to autoscale based on multiple metrics, HPA evaluates each metric separately and uses the scaling algorithm to determine the new workload scale based on each one. The largest scale is selected for the autoscale action.

-- Cloud.google.com: Kubernetes Engine: HorizontalPodAutoscaler

As for more of a workaround solution you could also use the CPU usage metric.


Additional resources: