Helo i am running a .NET application in Azure Kubernetes Services as a 3 pod cluster (1 pod per node).
I am trying to understand how can i make my cluster elastic depending on load ?
How can i configure the deployment.yaml
so that after a certain %
of the cpu utilization and/or %
of memory per pod it spawns another pod? The same thing when load decreases, how do i shut down instances.
Is there any guide/tutorial to set this up based on percentage (ideally) ?
The basic feature you need to use is called
HorizontalPodAutoscaler
or for shortHPA
. There you can configure cpu or memory limits and if the limit is exceeded, the pod replica number will be increased. E.g. from this walkthrough:This will scale out the php-apache deployment, as soon as the pods cpu utilization is greater than 50 %. Be aware that calculating the resource utilization and the resulting number of replicas is not as intuitive, as it might seam. Also see docs (the whole page should be quite interesting too). You can also combine criteria for scale out.
There are also addons that help you scale based on other parameters, like the number of messages in a queue. Check out keda, they provide different scalers, like RabbitMQ, Kafka, AWS CloudWatch, Azure Monitor, etc.
And since you wrote
you might be running a
DaemonSet
. In that case your only option to scale out would be to add additional nodes, since with daemonsets there is always exactly one pod per node. If that's the case you could think about using aDeployment
combined with aPodAntiAffinity
instead, see docs. By that you can configure pods to preferably run on nodes where pods of the same deployment are not running yet, e.g.:From docs:
That would make scale out more flexible as it is with a daemonset, yet you have a similar effect of pods being equally distributed through out the cluster.
If you want/need to stick to a daemonset you can check out the AKS Cluster Autoscaler, that can be used to automatically add/remove additional nodes from your cluster, based on resource consumption.