K8s Pod Anti Affinity for Cronjob Pod Even Scheduling

1k views Asked by At

In a multi-tenant scenario with 500 namespaces, each with an identical Cronjob, labeled app=some-job, and 20 worker nodes, is it possible to coerce the k8s scheduler to spread the 500 Cronjob Pods evenly across the 20 nodes, such that any node would only have ~25 completed and/or running Pods at a given time?

I've noticed that the 500 Cronjob Pods tend to only be scheduled on roughly 7 out of the 20 nodes, and the KubeletTooManyPods alarm fires, even though most of the pods are in the Completed state.

I'm thinking a solution could be to apply a Pod anti-affinity on the label app=some-job, with the topologyKey=kubernetes.io/hostname, but not sure if this honors Completed Pods, and if it would do an even spread once all 20 nodes had at least 1 Pod on them, at which point every node would fail the anti-affinity case, but I hope preferredDuringSchedulingIgnoreDuringExecution would allow scheduling to continue with an even spread.

Is there a better way to achieve this spread, maybe a custom scheduler?

Edit: Wanted to mention we're using EKS 1.17 Edit 2: Typo

1

There are 1 answers

0
Oliver On BEST ANSWER

The presence of Complete jobs does not affect the scheduling logic, so I doubt podTopologySpreadConstraints will help. You are better off using history limits (kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/…)

One of your comments indicates you need the logs: upload the logs of the pod as part of the job, ie at the end of the script run by the cronjob, push to s3 or fluentbit or wherever. Then you are guaranteed that after cronjob completion, the logs are safe. Job logs can disappear for various reasons (they can be cleared, pods can get evicted or deleted, etc), so it is not a good idea to rely on the presence of Completed jobs to access them.