Allow scheduling multiple pod when we have anti affinity enabled

236 views Asked by At

I have a deployment where I have added affinity as below -

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: kubernetes.io/hostname
          operator: In
          values:
          - example.com
  podAntiAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
    - labelSelector:
        matchExpressions:
        - key: component
          operator: In
          values:
          - myapp
      topologyKey: "kubernetes.io/hostname"

Now, whenever I update configurations in this pod, Upgraded pod is not being scheduled due to error -

Warning FailedScheduling 12s default-scheduler 0/1 nodes are available: 1 node(s) didn't match pod anti-affinity rules. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod

Please provide suggestions, how can I fix this issue.

Tried preferredDuringSchedulingIgnoredDuring execution method, but no luck

I have 3 nodes in my cluster.

3

There are 3 answers

0
Keyboard Corporation On BEST ANSWER

Given that you have 3 replicas and 3 nodes in your cluster, it seems like the pods are evenly distributed across the nodes. However, when you update a configuration, a new pod is created, and the Kubernetes scheduler tries to place it on a node where no other pod with the component label myapp is running. If all nodes already have a pod with this label, the new pod cannot be scheduled, leading to the error message you're seeing.

To address this issue, please consider the following options;

  1. Use preferredDuringSchedulingIgnoredDuringExecution for pod anti-affinity to specify that the anti-affinity rule is a "soft" requirement, rather than a "hard" requirement.

Example;

affinity:
 podAntiAffinity:
 preferredDuringSchedulingIgnoredDuringExecution:
 - weight: 100
   podAffinityTerm:
     labelSelector:
       matchExpressions:
       - key: component
         operator: In
         values:
         - myapp
     topologyKey: "kubernetes.io/hostname"
  1. Adjust the maxUnavailable parameter in your deployment strategy.

Example;

strategy:
 type: RollingUpdate
 rollingUpdate:
 maxUnavailable: 1

In the 2nd example, the maxUnavailable value is set to 1, which means that Kubernetes can evict one pod to make room for a new one. This should allow the new pod to be scheduled, even if it means violating the anti-affinity rule.

If the 1 & 2 solutions don't work, I suggest if not require to scale down your application to have fewer than 3 pods running at the same time (e.i, kubectl scale deployment myapp --replicas=2). This would allow the new pod to be scheduled on one of the nodes that currently has a pod running.

0
Abel Matos On

Maybe your case is better defined by the use of Pod Topology Spread Constraints

Please check the documentation: https://kubernetes.io/docs/concepts/scheduling-eviction/topology-spread-constraints/#comparison-with-podaffinity-podantiaffinity

0
Will On

You can set :

strategy:
  rollingUpdate:
    maxUnavailable: 50%

By default this value is set to 25%, which in your case means 0 (25% of 3 being 0.75, which gets rounded down to 0).

So it means you have 3 pods, that must all go 3 different nodes, and on rollout you cannot remove any pod (maxUnavailable) and at the same time you cannot schedule any more pods (antiAffinity). It doesn't work.

You must either use preferredDuringSchedulingIgnoredDuring to allow more pods being schedule on the same node, or use maxUnavailable to allow some pod to get removed and make some room for the ones comming. If preferredDuringSchedulingIgnoredDuring doesn't work in your case, you can use the other.

Note this configuration will make your app having only 2 pods running at the time of rollout rather than 3. (50% of 3 is 1.5, which get rounded down to 1)

.spec.strategy.rollingUpdate.maxUnavailable is an optional field that specifies the maximum number of Pods that can be unavailable during the update process. The value can be an absolute number (for example, 5) or a percentage of desired Pods (for example, 10%). The absolute number is calculated from percentage by rounding down. The value cannot be 0 if .spec.strategy.rollingUpdate.maxSurge is 0. The default value is 25%

https://kubernetes.io/docs/concepts/workloads/controllers/deployment/#rolling-update-deployment