Kubernetes node’s condition based scheduling using helm chart

922 views Asked by At

Team, needed some assistance..

Pod scheduling based on condition of node that is reported in

kubectl get node node1 -o yaml

can we do node.condition based scheduling..the way we have nodeSelector like below?

  team.nb/service: services
  nodeType: cpu

basically, if a k8s worker node reports below condition in its node.yaml how can I put up helm chart to match that ?

  - lastHeartbeatTime: "2020-09-29T00:06:24Z"
    lastTransitionTime: "2020-08-16T12:47:16Z"
    message: kubelet has disk pressure
    reason: KubeletHasNoDiskPressure
    status: "false"
    type: DiskPressure

can we build helm_chart based on this value for scheduling pods? such that pods land on it only if the below is True

I was searching google but could not find a clue yet..

    reason: KubeletHasNoDiskPressure
    status: True

There are 2 answers

David Maze On

You probably want to use resource constraints to request the disk space you actually need, or better still, avoid needing "lots" of local disk space.

The Kubernetes documentation on Assigning Pods to Nodes notes the options you have available to you: pinning a pod to a specific node, matching by labels, or advising placement based on the presence of other known pods. It also notes that

Generally such constraints are unnecessary, as the scheduler will automatically do a reasonable placement (e.g. spread your pods across nodes, not place the pod on a node with insufficient free resources, etc.)....

A cluster can be configured to make "ephemeral storage" a resource type; this includes disk used by the container filesystem. If you know that you'll need "a lot" of disk space and your cluster is configured for this, you can put an appropriate resource request in your YAML file:

  - name: main
        ephemeral-storage: 20Gi

If you really need that much storage, a volume is a better match. If you have multiple replicas you might need a StatefulSet to automatically create a PersistentVolumeClaim per replica. Note that an emptyDir volume counts against the ephemeral-storage limit and doesn't get around this.

If a volume doesn't seem right either, it's worth looking at why you need that much storage. If you need scratch space for a very large computation, you'll have to put that somewhere. When I've hit trouble like this, it was a debug-level log file growing out of control and the right answer was to turn down the log level.

Just letting the pod or node crash is a legitimate option! If a pod crashes, it will generally get restarted (though on the same node; specifying the resource limits correctly is important). If a node fails then the pods on it will get recreated on different nodes. If you have a cluster autoscaler and there are nodes that can't be scheduled due to their resource constraints, that will cause there to be more nodes and pods can get rescheduled there.

Howard_Roark On

The Scheduler already does something like this-- if a node has Disk Pressure, the node controller will taint it so that the Scheduler won't place anything else on it. In other words, if a node has no Disk Pressure, then the Scheduler will place pods there, which seems to be what you want?

The node controller automatically taints a Node when certain conditions are true. The following taints are built in:

node.kubernetes.io/disk-pressure: Node has disk pressure.

You can also adjust your Kubelet configuration such that it evicts pods when there is less than xyz% of the disk left to prevent Disk Pressure.

I know that your question was sort of about Condition Based scheduling, but your specific example of wanting to Schedule pods on Nodes without Disk Pressure is handled out of the box and so there is no reason to do that.