Kubernetes - Horizontal Pod Scaler error, target "unknown". Message "no recommendation"

1.2k views Asked by At

I have a working 1.23.9 kubernetes cluster hosted on Google Kubernetes Engine with multi-cluster services enabled, one cluster hosted in us and another in eu. I have multiple deployment apps and hpa configured for each through YAML. Out of 7 deployment apps, HPA is only working for one app. service-1 can only be accessed from service-2 internally and service-2 is exposed through HttpGateway by GKE. Please find more info below. Any help would be extremely appreciated.

Deployment file, I have posted only 2 apps, service-2's HPA is working fine, whereas service-1's is not.

$ cat deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: service-1
  namespace: backend
  labels:
    app: service-1
spec:
  replicas: 1
  selector:
    matchLabels:
      lbtype: internal
  template:
    metadata:
      labels:
        lbtype: internal
        app: service-1
    spec:
      containers:
        - name: service-1
          image: [REDACTED]
          ports:
            - containerPort: [REDACTED]
              name: "[REDACTED]"
          resources:
            requests:
              memory: "128Mi"
              cpu: "100m"
            limits:
              memory: "256Mi"
              cpu: "500m"
      imagePullSecrets:
      - name: docker-gcr
      restartPolicy: Always
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: service-2
  namespace: backend
  labels:
    app: service-2
spec:
  replicas: 2
  selector:
    matchLabels:
      lbtype: external
  template:
    metadata:
      labels:
        lbtype: external
        app: service-2
    spec:
      containers:
        - name: service-2
          image: [REDACTED]
          ports:
            - containerPort: [REDACTED]
              name: "[REDACTED]"
          resources:
            requests:
              memory: "256Mi"
              cpu: "100m"
            limits:
              memory: "512Mi"
              cpu: "500m"
      imagePullSecrets:
      - name: docker-gcr
      restartPolicy: Always

HorizontalPodScaler file:

$ cat horizontal-pod-scaling.yaml

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
 name: service-1
 namespace: backend
spec:
 scaleTargetRef:
   apiVersion: apps/v1
   kind: Deployment
   name: service-1
 minReplicas: 1
 maxReplicas: 2
 metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
 name: service-2
 namespace: backend
spec:
 scaleTargetRef:
   apiVersion: apps/v1
   kind: Deployment
   name: service-2
 minReplicas: 2
 maxReplicas: 4
 metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Service file:

$ cat service.yaml

apiVersion: v1
kind: Service
metadata:
  name: backend-internal
  namespace: backend
spec:
  type: ClusterIP
  ports:
    - name: service-1
      port: [REDACTED]
      targetPort: "[REDACTED]"
  selector:
    lbtype: internal
---
apiVersion: v1
kind: Service
metadata:
  name: backend-middleware
  namespace: backend
spec:
  ports:
    - name: service-2
      port: [REDACTED]
      targetPort: "[REDACTED]"
  selector:
    lbtype: external
$ kctl get hpa
NAME               REFERENCE                     TARGETS         MINPODS   MAXPODS   REPLICAS   AGE
service-1          Deployment/service-1          <unknown>/70%   1         2         1          18h
service-2          Deployment/service-2          4%/70%          2         4         2          18h
$ kctl top pod
NAME                                CPU(cores)   MEMORY(bytes)         
service-1-8f7dc66cc-xtz76               3m           66Mi            
service-2-5fd767cbc-vm7f5               4m           76Mi              
$ kubectl describe deployment metrics-server-v0.5.2 -nkube-system

Name:                   metrics-server-v0.5.2
Namespace:              kube-system
CreationTimestamp:      Fri, 02 Dec 2022 11:01:18 +0530
Labels:                 addonmanager.kubernetes.io/mode=Reconcile
                        k8s-app=metrics-server
                        version=v0.5.2
Annotations:            components.gke.io/layer: addon
                        deployment.kubernetes.io/revision: 4
Selector:               k8s-app=metrics-server,version=v0.5.2
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
...
Containers:
   metrics-server:
    Image:      gke.gcr.io/metrics-server:v0.5.2-gke.1
    Port:       10250/TCP
    Host Port:  10250/TCP
    Command:
      /metrics-server
      --metric-resolution=30s
      --kubelet-port=10255
      --deprecated-kubelet-completely-insecure=true
      --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
      --cert-dir=/tmp
      --secure-port=10250
$ kctl describe hpa service-1

Conditions:
  Type            Status  Reason                   Message
  ----            ------  ------                   -------
  AbleToScale     True    ReadyForNewScale         recommended size matches current size
  ScalingActive   False   FailedGetResourceMetric  the HPA was unable to compute the replica count: no recommendation
  ScalingLimited  False   DesiredWithinRange       the desired count is within the acceptable range
Events:
  Type     Reason                   Age                  From                       Message
  ----     ------                   ----                 ----                       -------
  Warning  FailedGetResourceMetric  2m (x4470 over 18h)  horizontal-pod-autoscaler  no recommendation
$ kctl describe hpa service-2

Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    ReadyForNewScale  recommended size matches current size
  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  True    TooFewReplicas    the desired replica count is less than the minimum replica count
Events:           <none>
2

There are 2 answers

3
Veera Nagireddy On

As per my understanding ScalingActive=False It should not affect the auto scaling in a major way.

Check below possible solutions :

1)Check The Resource Metric : You can remove the LIMITS from your deployments and try it. Try only Pod's containers must be set relevant REQUESTS for RESOURCES at the deployment level and it may work. If you see the HPA is working then later you can play with LIMITS as well. This discussion tells you that only using REQUESTS is sufficient to do the HPA.

2)FailedGetResourceMetric : Check if metric is registered and available (also look at "Custom and external metrics"). Try executing the commands kubectl top node and kubectl top pod -A to verify that metrics-server is working properly.

The HPA controller runs regularly to check if any adjustments to the system are required. During each run, the controller manager queries the resource utilization against the metrics specified in each HorizontalPodAutoscaler definition. The controller manager obtains the metrics from either the resource metrics API (for per-pod resource metrics).

Basically HPA targets deployment by name, uses deployment selector labels to get pod's metrics. One may have two deployments that use the same selector and then HPA would get metrics for pods of both deployments. Try the same deployment with a kind cluster and it may work fine.

3)Kubernetes Metrics Server is a scalable, efficient source of container resource metrics for Kubernetes built-in autoscaling pipelines. Metrics Server for CPU/Memory based horizontal autoscaling. Check Requirements : Kubernetes Metrics Server has specific requirements for cluster and network configuration. These requirements aren't the default for all cluster distributions. Please ensure that your cluster distribution supports these requirements before using Metrics Server.

4)HPA process scaleup event every 15-30 seconds and It may take around 3-4 min because of latency of metrics data.

5)Check this relevant SO for more information.

0
Shreyas S On

A little bit late to answer this question but, looks like the issue was with Google Kubernetes itself. An incident was reported and fixed 3 months ago. Everything is working smoothly now.