dynamically scale pods with new nodes addition in k8s cluster

943 views Asked by At

I am building a application in k8s where I want the replicas of deployments/statefulsets to scale as per the number of nodes added.

Initially deployment should come up with 1 replicas when 1st node is created and grow as we add more worker/master nodes to it and once max is achieved, it should stop growing. I am using local storage and I dont want statefulsets getting scheduled in a single node.

Assume I have a deployment where I expect 2 repicas to run. Only one should come when 1st node is launched. Finally when I have a 3 node master, It should have 2 replicas running in 2 nodes.

Is there anyway, I can achieve this. TIA

1

There are 1 answers

0
weibeld On

There are various options.

DaemonSet

If you want exactly one replica of your app on every worker node, you can use a DaemonSet (although I guess you want to have only up to certain number of replicas, so in this case, this isn't a solution for your use case).

Pod anti-affinty

You can define a Pod anti-affinity for the Pods of your Deployment with a requiredDuringSchedulingIgnoredDuringExecution type and a topologyKey referring to a label that's different on each node. In this way, no two Pods of your Deployment will be scheduled to the same node.

For example, if you define three replicas in your Deployment, and there are only two worker nodes available, then two replicas will be scheduled on these two worker nodes and the third replica will remain pending until a third worker node is created, in which case it will be scheduled to this node.

Operator

The most flexible solution is creating an operator. In this case, you create a new custom resource which encodes your desired deployment behaviour (e.g. the desired maximum number of replicas). You do this by defining a custom resource definition (CRD). You then create an operator which is an application that interacts with the Kubernetes API and enforces this behaviour.

At runtime, this may then look as follows:

  • You create an instance of your custom resource → the operator becomes active, checks the declared number of replicas in the custom resource, checks the number of available worker nodes, and creates the appropriate number of replicas.
  • You add an additional node to the cluster → the operator becomes active, checks if there are any pending replicas in the instances of your custom resource, and if so, schedules one of them to the new node.
  • You remove a node from the cluster → the operator becomes active and makes sure the replicas on the removed node are not scheduled to another node but just become pending until a new node is created.

You can extend this logic in any way you want, since you can implement any logic you want in an operator.