In a multi-tenant scenario with 500 namespaces
, each with an identical Cronjob
, labeled app=some-job
, and 20 worker nodes
, is it possible to coerce the k8s scheduler to spread the 500 Cronjob Pods evenly across the 20 nodes, such that any node would only have ~25 completed and/or running Pods at a given time?
I've noticed that the 500 Cronjob Pods tend to only be scheduled on roughly 7 out of the 20 nodes, and the KubeletTooManyPods
alarm fires, even though most of the pods are in the Completed state.
I'm thinking a solution could be to apply a Pod anti-affinity on the label app=some-job
, with the topologyKey=kubernetes.io/hostname
, but not sure if this honors Completed
Pods, and if it would do an even spread once all 20 nodes had at least 1 Pod on them, at which point every node would fail the anti-affinity case, but I hope preferredDuringSchedulingIgnoreDuringExecution
would allow scheduling to continue with an even spread.
Is there a better way to achieve this spread, maybe a custom scheduler?
Edit: Wanted to mention we're using EKS 1.17 Edit 2: Typo
The presence of
Complete
jobs does not affect the scheduling logic, so I doubtpodTopologySpreadConstraints
will help. You are better off using history limits (kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/…)One of your comments indicates you need the logs: upload the logs of the pod as part of the job, ie at the end of the script run by the cronjob, push to s3 or fluentbit or wherever. Then you are guaranteed that after cronjob completion, the logs are safe. Job logs can disappear for various reasons (they can be cleared, pods can get evicted or deleted, etc), so it is not a good idea to rely on the presence of
Completed
jobs to access them.