Team, needed some assistance..
Pod scheduling based on condition of node that is reported in
kubectl get node node1 -o yaml
can we do node.condition based scheduling..the way we have nodeSelector like below?
nodeSelector:
team.nb/service: services
nodeType: cpu
basically, if a k8s worker node reports below condition in its node.yaml how can I put up helm chart to match that ?
- lastHeartbeatTime: "2020-09-29T00:06:24Z"
lastTransitionTime: "2020-08-16T12:47:16Z"
message: kubelet has disk pressure
reason: KubeletHasNoDiskPressure
status: "false"
type: DiskPressure
can we build helm_chart based on this value for scheduling pods? such that pods land on it only if the below is True
I was searching google but could not find a clue yet..
nodeConditions:
reason: KubeletHasNoDiskPressure
status: True
You probably want to use resource constraints to request the disk space you actually need, or better still, avoid needing "lots" of local disk space.
The Kubernetes documentation on Assigning Pods to Nodes notes the options you have available to you: pinning a pod to a specific node, matching by labels, or advising placement based on the presence of other known pods. It also notes that
A cluster can be configured to make "ephemeral storage" a resource type; this includes disk used by the container filesystem. If you know that you'll need "a lot" of disk space and your cluster is configured for this, you can put an appropriate resource request in your YAML file:
If you really need that much storage, a volume is a better match. If you have multiple replicas you might need a StatefulSet to automatically create a PersistentVolumeClaim per replica. Note that an
emptyDir
volume counts against the ephemeral-storage limit and doesn't get around this.If a volume doesn't seem right either, it's worth looking at why you need that much storage. If you need scratch space for a very large computation, you'll have to put that somewhere. When I've hit trouble like this, it was a debug-level log file growing out of control and the right answer was to turn down the log level.
Just letting the pod or node crash is a legitimate option! If a pod crashes, it will generally get restarted (though on the same node; specifying the resource limits correctly is important). If a node fails then the pods on it will get recreated on different nodes. If you have a cluster autoscaler and there are nodes that can't be scheduled due to their resource constraints, that will cause there to be more nodes and pods can get rescheduled there.