I’m trying to make use of k8s daemonset's rolling update to do the automatic rolling update when daemonset's spec.template field is changed. I intentionally put an invalid image for pods so that pods couldn't be started correctly. I suppose the rolling update could be stopped when the number of unavailable pods more than the number defined in maxUnavailable
. Unfortunately, it doesn't happen, and the pods are kept updated until all pods enter CrashLoopBackOff.
I run my test in 3 nodes env: kubectl get node -A
NAME STATUS ROLES AGE VERSION wdc-rdops-vm05-dhcp-74-190 Ready <none> 65d v1.18.0 wdc-rdops-vm05-dhcp-86-61 Ready master 65d v1.18.0 wdc-rdops-vm05-dhcp-93-214 Ready <none> 65d v1.18.0
I found a similar thread in: How to automatically stop rolling update when CrashLoopBackOff? but here is for daemonSet not for deployment.
As suggested in the thread, I've added
spec:
minReadySeconds: 120
in order to make sure containers are running well to set pod available or unavailable status.
However, the final 3 pods are crashed
nsx-system nsx-node-agent-9cl2v 0/3 CrashLoopBackOff 3 23s
nsx-system nsx-node-agent-c95wb 3/3 Running 3 11m
nsx-system nsx-node-agent-p58vs 3/3 Running 3 11m
The first deployed pod was not healthy for more than 120 seconds, it should be unavailable. However, the update was not stopped as expected, it kept going until all pods replcaed but crashed:
nsx-system nsx-node-agent-9cl2v 0/3 CrashLoopBackOff 45 15m
nsx-system nsx-node-agent-6mlmq 0/3 CrashLoopBackOff 48 2m46s
nsx-system nsx-node-agent-9fzcc 0/3 CrashLoopBackOff 57 2m59s
The complete daemonset's spec YAML: kubectl get ds -n nsx-system nsx-node-agent -o yaml
apiVersion: apps/v1 kind: DaemonSet metadata: creationTimestamp: "2021-02-21T11:28:03Z" generation: 101 labels: component: nsx-node-agent tier: nsx-networking version: v1 managedFields: - apiVersion: apps/v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:deprecated.daemonset.template.generation: {} f:kubectl.kubernetes.io/last-applied-configuration: {} f:labels: .: {} f:component: {} f:tier: {} f:version: {} f:spec: f:revisionHistoryLimit: {} f:selector: f:matchLabels: .: {} f:component: {} f:tier: {} f:version: {} f:template: f:metadata: f:annotations: .: {} f:container.apparmor.security.beta.kubernetes.io/nsx-node-agent: {} f:labels: .: {} f:component: {} f:tier: {} f:version: {} f:spec: f:containers: k:{"name":"nsx-kube-proxy"}: .: {} f:command: {} f:env: .: {} k:{"name":"CONTAINER_NAME"}: .: {} f:name: {} f:value: {} k:{"name":"POD_NAME"}: .: {} f:name: {} f:valueFrom: .: {} f:fieldRef: .: {} f:apiVersion: {} f:fieldPath: {} f:imagePullPolicy: {} f:livenessProbe: .: {} f:exec: .: {} f:command: {} f:failureThreshold: {} f:initialDelaySeconds: {} f:periodSeconds: {} f:successThreshold: {} f:timeoutSeconds: {} f:name: {} f:resources: {} f:securityContext: .: {} f:capabilities: .: {} f:add: {} f:terminationMessagePath: {} f:terminationMessagePolicy: {} f:volumeMounts: .: {} k:{"mountPath":"/etc/nsx-ujo"}: .: {} f:mountPath: {} f:name: {} f:readOnly: {} k:{"mountPath":"/var/log/nsx-ujo"}: .: {} f:mountPath: {} k:{"mountPath":"/var/run/openvswitch"}: .: {} f:mountPath: {} f:name: {} k:{"name":"nsx-node-agent"}: .: {} f:command: {} f:env: .: {} k:{"name":"CONTAINER_NAME"}: .: {} f:name: {} f:value: {} k:{"name":"POD_NAME"}: .: {} f:name: {} f:valueFrom: .: {} f:fieldRef: .: {} f:apiVersion: {} f:fieldPath: {} f:imagePullPolicy: {} f:livenessProbe: .: {} f:exec: {} f:failureThreshold: {} f:initialDelaySeconds: {} f:periodSeconds: {} f:successThreshold: {} f:timeoutSeconds: {} f:name: {} f:resources: {} f:securityContext: .: {} f:capabilities: .: {} f:add: {} f:terminationMessagePath: {} f:terminationMessagePolicy: {} f:volumeMounts: .: {} k:{"mountPath":"/etc/nsx-ujo"}: .: {} f:mountPath: {} f:name: {} f:readOnly: {} k:{"mountPath":"/host/etc/os-release"}: .: {} f:mountPath: {} f:name: {} f:readOnly: {} k:{"mountPath":"/host/proc"}: .: {} f:mountPath: {} f:name: {} f:readOnly: {} k:{"mountPath":"/host/var/run/netns"}: .: {} f:mountPath: {} f:mountPropagation: {} f:name: {} k:{"mountPath":"/var/lib/kubelet/device-plugins/"}: .: {} f:mountPath: {} f:name: {} f:readOnly: {} k:{"mountPath":"/var/log/nsx-ujo"}: .: {} f:mountPath: {} k:{"mountPath":"/var/run/nsx-ujo"}: .: {} f:mountPath: {} f:name: {} k:{"mountPath":"/var/run/openvswitch"}: .: {} f:mountPath: {} f:name: {} k:{"name":"nsx-ovs"}: .: {} f:command: {} f:imagePullPolicy: {} f:livenessProbe: .: {} f:exec: .: {} f:command: {} f:failureThreshold: {} f:initialDelaySeconds: {} f:periodSeconds: {} f:successThreshold: {} f:timeoutSeconds: {} f:name: {} f:resources: {} f:securityContext: .: {} f:capabilities: .: {} f:add: {} f:terminationMessagePath: {} f:terminationMessagePolicy: {} f:volumeMounts: .: {} k:{"mountPath":"/etc/nsx-ujo"}: .: {} f:mountPath: {} f:name: {} f:readOnly: {} k:{"mountPath":"/etc/openvswitch"}: .: {} f:mountPath: {} f:name: {} f:subPath: {} k:{"mountPath":"/host/etc/openvswitch"}: .: {} f:mountPath: {} f:name: {} k:{"mountPath":"/host/etc/os-release"}: .: {} f:mountPath: {} f:name: {} f:readOnly: {} k:{"mountPath":"/lib/modules"}: .: {} f:mountPath: {} f:name: {} f:readOnly: {} k:{"mountPath":"/sys"}: .: {} f:mountPath: {} f:name: {} f:readOnly: {} k:{"mountPath":"/var/log/nsx-ujo"}: .: {} f:mountPath: {} k:{"mountPath":"/var/log/openvswitch"}: .: {} f:mountPath: {} f:name: {} f:subPath: {} k:{"mountPath":"/var/run/openvswitch"}: .: {} f:mountPath: {} f:name: {} f:dnsPolicy: {} f:hostNetwork: {} f:restartPolicy: {} f:schedulerName: {} f:securityContext: {} f:serviceAccount: {} f:serviceAccountName: {} f:terminationGracePeriodSeconds: {} f:tolerations: {} f:volumes: .: {} k:{"name":"device-plugins"}: .: {} f:hostPath: .: {} f:path: {} f:type: {} f:name: {} k:{"name":"host-modules"}: .: {} f:hostPath: .: {} f:path: {} f:type: {} f:name: {} k:{"name":"host-original-ovs-db"}: .: {} f:hostPath: .: {} f:path: {} f:type: {} f:name: {} k:{"name":"host-os-release"}: .: {} f:hostPath: .: {} f:path: {} f:type: {} f:name: {} k:{"name":"host-sys"}: .: {} f:hostPath: .: {} f:path: {} f:type: {} f:name: {} k:{"name":"host-var-log-ujo"}: .: {} f:hostPath: .: {} f:path: {} f:type: {} f:name: {} k:{"name":"netns"}: .: {} f:hostPath: .: {} f:path: {} f:type: {} f:name: {} k:{"name":"openvswitch"}: .: {} f:hostPath: .: {} f:path: {} f:type: {} f:name: {} k:{"name":"proc"}: .: {} f:hostPath: .: {} f:path: {} f:type: {} f:name: {} k:{"name":"projected-volume"}: .: {} f:name: {} f:projected: .: {} f:defaultMode: {} f:sources: {} k:{"name":"var-run-ujo"}: .: {} f:hostPath: .: {} f:path: {} f:type: {} f:name: {} f:updateStrategy: f:rollingUpdate: .: {} f:maxUnavailable: {} f:type: {} manager: kubectl operation: Update time: "2021-04-19T08:07:54Z" - apiVersion: apps/v1 fieldsType: FieldsV1 fieldsV1: f:spec: f:minReadySeconds: {} f:template: f:spec: f:containers: k:{"name":"nsx-kube-proxy"}: f:image: {} f:volumeMounts: k:{"mountPath":"/var/log/nsx-ujo"}: f:name: {} k:{"name":"nsx-node-agent"}: f:image: {} f:livenessProbe: f:exec: f:command: {} f:volumeMounts: k:{"mountPath":"/var/log/nsx-ujo"}: f:name: {} k:{"name":"nsx-ovs"}: f:image: {} f:volumeMounts: k:{"mountPath":"/var/log/nsx-ujo"}: f:name: {} f:status: f:desiredNumberScheduled: {} manager: nsx-ncp-operator operation: Update time: "2021-04-27T10:01:23Z" - apiVersion: apps/v1 fieldsType: FieldsV1 fieldsV1: f:status: f:currentNumberScheduled: {} f:numberReady: {} f:numberUnavailable: {} f:observedGeneration: {} f:updatedNumberScheduled: {} manager: kube-controller-manager operation: Update time: "2021-04-27T10:15:28Z" name: nsx-node-agent namespace: nsx-system resourceVersion: "14594084" selfLink: /apis/apps/v1/namespaces/nsx-system/daemonsets/nsx-node-agent uid: e3dd0951-1b31-4095-8c27-56ec9780d94e spec: minReadySeconds: 120 revisionHistoryLimit: 10 selector: matchLabels: component: nsx-node-agent tier: nsx-networking version: v1 template: metadata: annotations: container.apparmor.security.beta.kubernetes.io/nsx-node-agent: localhost/node-agent-apparmor creationTimestamp: null labels: component: nsx-node-agent tier: nsx-networking version: v1 spec: containers: - command: - start_node_agent env: - name: POD_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name - name: CONTAINER_NAME value: nsx-node-agent image: registry.access.redhat.com/ubi8/ubi:latest imagePullPolicy: IfNotPresent livenessProbe: exec: command: - /bin/sh - -c - check_pod_liveness nsx-node-agent 5 failureThreshold: 5 initialDelaySeconds: 60 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 name: nsx-node-agent resources: {} securityContext: capabilities: add: - NET_ADMIN - SYS_ADMIN - SYS_PTRACE - DAC_READ_SEARCH - NET_RAW - AUDIT_WRITE terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/nsx-ujo name: projected-volume readOnly: true - mountPath: /var/run/openvswitch name: openvswitch - mountPath: /var/run/nsx-ujo name: var-run-ujo - mountPath: /host/var/run/netns mountPropagation: HostToContainer name: netns - mountPath: /host/proc name: proc readOnly: true - mountPath: /var/lib/kubelet/device-plugins/ name: device-plugins readOnly: true - mountPath: /host/etc/os-release name: host-os-release readOnly: true - mountPath: /var/log/nsx-ujo name: host-var-log-ujo - command: - start_kube_proxy env: - name: POD_NAME valueFrom: fieldRef: apiVersion: v1 fieldPath: metadata.name - name: CONTAINER_NAME value: nsx-kube-proxy image: registry.access.redhat.com/ubi8/ubi:latest imagePullPolicy: IfNotPresent livenessProbe: exec: command: - /bin/sh - -c - check_pod_liveness nsx-kube-proxy 5 failureThreshold: 5 initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 name: nsx-kube-proxy resources: {} securityContext: capabilities: add: - NET_ADMIN - SYS_ADMIN - SYS_PTRACE - DAC_READ_SEARCH - NET_RAW - AUDIT_WRITE terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/nsx-ujo name: projected-volume readOnly: true - mountPath: /var/run/openvswitch name: openvswitch - mountPath: /var/log/nsx-ujo name: host-var-log-ujo - command: - start_ovs image: registry.access.redhat.com/ubi8/ubi:latest imagePullPolicy: IfNotPresent livenessProbe: exec: command: - /bin/sh - -c - check_pod_liveness nsx-ovs 10 failureThreshold: 3 initialDelaySeconds: 10 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 10 name: nsx-ovs resources: {} securityContext: capabilities: add: - NET_ADMIN - SYS_ADMIN - SYS_NICE - SYS_MODULE terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/nsx-ujo name: projected-volume readOnly: true - mountPath: /etc/openvswitch name: var-run-ujo subPath: openvswitch-db - mountPath: /var/run/openvswitch name: openvswitch - mountPath: /sys name: host-sys readOnly: true - mountPath: /host/etc/openvswitch name: host-original-ovs-db - mountPath: /lib/modules name: host-modules readOnly: true - mountPath: /host/etc/os-release name: host-os-release readOnly: true - mountPath: /var/log/openvswitch name: host-var-log-ujo subPath: openvswitch - mountPath: /var/log/nsx-ujo name: host-var-log-ujo dnsPolicy: ClusterFirst hostNetwork: true restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: nsx-node-agent-svc-account serviceAccountName: nsx-node-agent-svc-account terminationGracePeriodSeconds: 60 tolerations: - effect: NoSchedule key: node-role.kubernetes.io/master - effect: NoSchedule key: node.kubernetes.io/not-ready - effect: NoSchedule key: node.kubernetes.io/unreachable volumes: - name: projected-volume projected: defaultMode: 420 sources: - configMap: items: - key: ncp.ini path: ncp.ini name: nsx-node-agent-config - configMap: items: - key: version path: VERSION name: nsx-ncp-version-config - hostPath: path: /var/run/openvswitch type: "" name: openvswitch - hostPath: path: /var/run/nsx-ujo type: "" name: var-run-ujo - hostPath: path: /var/run/netns type: "" name: netns - hostPath: path: /proc type: "" name: proc - hostPath: path: /var/lib/kubelet/device-plugins/ type: "" name: device-plugins - hostPath: path: /var/log/nsx-ujo type: DirectoryOrCreate name: host-var-log-ujo - hostPath: path: /sys type: "" name: host-sys - hostPath: path: /lib/modules type: "" name: host-modules - hostPath: path: /etc/openvswitch type: "" name: host-original-ovs-db - hostPath: path: /etc/os-release type: "" name: host-os-release updateStrategy: rollingUpdate: maxUnavailable: 1 type: RollingUpdate status: currentNumberScheduled: 3 desiredNumberScheduled: 3 numberMisscheduled: 0 numberReady: 0 numberUnavailable: 3 observedGeneration: 101 updatedNumberScheduled: 3
The ds output as below: kc get ds -n nsx-system -w
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
nsx-node-agent 3 3 0 3 0 <none> 64d
I don't understand why k8s didn't stop when the number of unavailable pods more than maxUnavailable: 1.
In addition: we see pods's age is far more than minReadySeconds
Seemly, k8's rolling update strategy doesn't follow the defined spec? It shouldn't allow this situation to happen when rolling update.
I don't see readiness probes defined in your manifests. Without readiness probes, Kubernetes will consider a pod to be "ready" as soon as the process is running, and will proceed with terminating other pods during a RollingUpdate.
A failing readiness probe on one pod with
maxUnavailable
set to 1 should stop the update - but if there is no such probe, there's nothing informing the cluster that pod is not actually ready to accept traffic.