Kubernetes HPA doesn't scale down after decreasing the loads

2.7k views Asked by At

the Kubernetes HPA works correctly when load of the pod increased but after the load decreased, the scale of deployment doesn't change. This is my HPA file:

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
  name: baseinformationmanagement
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: baseinformationmanagement
  minReplicas: 1
  maxReplicas: 3
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 80
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

My kubernetes version:

> kubectl version
Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.1", GitCommit:"d647ddbd755faf07169599a625faf302ffc34458", GitTreeState:"clean", BuildDate:"2019-10-02T17:01:15Z", GoVersion:"go1.12.10", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-18T23:22:30Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}

And this is my HPA describe:

> kubectl describe hpa baseinformationmanagement
Name:                                                     baseinformationmanagement
Namespace:                                                default
Labels:                                                   <none>
Annotations:                                              kubectl.kubernetes.io/last-applied-configuration:
                                                            {"apiVersion":"autoscaling/v2beta2","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"baseinformationmanagement","name...
CreationTimestamp:                                        Sun, 27 Sep 2020 06:09:07 +0000
Reference:                                                Deployment/baseinformationmanagement
Metrics:                                                  ( current / target )
  resource memory on pods  (as a percentage of request):  49% (1337899008) / 70%
  resource cpu on pods  (as a percentage of request):     2% (13m) / 50%
Min replicas:                                             1
Max replicas:                                             3
Deployment pods:                                          2 current / 2 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from memory resource utilization (percentage of request)
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:           <none>
1

There are 1 answers

2
David Maze On BEST ANSWER

Your HPA specifies both memory and CPU targets. The Horizontal Pod Autoscaler documentation notes:

If multiple metrics are specified in a HorizontalPodAutoscaler, this calculation is done for each metric, and then the largest of the desired replica counts is chosen.

The actual replica target is a function of the current replica count and the current and target utilization (same link):

desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]

For memory in particular: currentReplicas is 2; currentMetricValue is 49; desiredMetricValue is 80. So the target replica count is

desiredReplicas = ceil[       2        * (         49        /         80         )]
desiredReplicas = ceil[       2        *                   0.61                    ]
desiredReplicas = ceil[                          1.26                              ]
desiredReplicas = 2

Even if your service is totally idle, this will cause there to be (at least) 2 replicas, unless the service chooses to release memory back to the OS; that's usually up to the language runtime and a little out of your control.

Just removing the memory target and autoscaling based only on CPU might match better what you expect.