Spreading 3 pods across 3 nodes and deploying without downtime

156 views Asked by At

I have 3 k8s control plane nodes. Each of which must run a haproxy pod.

DaemonSet approach: A normal solution would be to deploy the haproxy pods as daemonset and each would get one haproxy pod. However, during deployment of a new version, there will be downtime as daemonset pods are not allowed to run concurrently.

Deployment approach Another solution would be to deploy them using deployment. Now I'll define that I need 3 replicas of haproxy to run and I have to decide how to spread them.

  • I can't use strict antiAffinity with requiredDuringSchedulingIgnoredDuringExecution because pods with the new version will never get scheduled on the nodes. preferredDuringSchedulingIgnoredDuringExecution does whatever it wants.
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 1
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - haproxy
              topologyKey: "kubernetes.io/hostname"
  • I tried with topologySpreadConstraints, however during deployment we have terminating and starting nodes in parallel which causes the scheduling to assign pods to nodes unevenly. It can't distinguish between pods in Terminating state and those in Running state. The below config is what I use.
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            app: haproxy
        maxSkew: 1
        topologyKey: kubernetes.io/hostname
        whenUnsatisfiable: DoNotSchedule
  • I've read about descheduler but I want to save my resources and keep my cluster as predictable as possible.

  • It's possible I run more replicas (pods) and hope there's at least one pod on each node. But that's a waste of resources.

  • matchLabelKeys described here might work in distinguishing between new and old pods, but when I add it to my 1.26.5 cluster it doesn't apply by the looks of it.

What are my options here?

Thanks

2

There are 2 answers

1
Ron Etch On

You can use labels that match the spec.tolerations such as node-role.kubernetes.io/control-plane:NoSchedule as explained in the documentation.

Below is a sample yaml file for reference use:

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: 
  namespace: kube-system
  labels:
    k8s-app: 
spec:
  selector:
    matchLabels:
      name: 
  template:
    metadata:
      labels:
        name: 
    spec:
      tolerations:
      # these tolerations are to have the daemonset runnable on control plane nodes
      # remove them if your control plane nodes should not run pods
      - key: node-role.kubernetes.io/control-plane
        operator: Exists
        effect: NoSchedule
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      containers:
      - name: 
        image: ....

You can review more about daemonset with this documentation.

0
Vladimir On

I'm not sure how but adding strategy.rollingUpdate.maxUnavailable: 0 appears to work well. The deployment takes longer but new pods are assigned evenly during the deployment now. I've tried this with 10+ deployments now and every time resulting in even distribution.