How to change default kube-scheduler in kubernetes

4.9k views Asked by At

This doc tells about how to run multiple scheduler. But I m not able to understand how does default scheduler is decided? Is it based on --leader-elect option?

Can I tell Kubernetes to use my-custom-scheduler as default scheduler instead of kube-scheduler? Is there another way to specify scheduler other than schedulerName in Pod/Deployment spec.

4

There are 4 answers

8
Arghya Sadhu On BEST ANSWER

How does default scheduler is decided? Is it based on --leader-elect option?

No it's not based on --leader-elect which is to run multiple replicated copy of the same scheduler with leader election enabled so that only one replica acts as leader at any given point in time.

Can I tell Kubernetes to use my-custom-scheduler as default scheduler instead of kube-scheduler?

You don't need to change the default scheduler at the kubernetes cluster level because you can tell kubernetes to use your custom scheduler in the pod spec. An example below using my-scheduler instead of default-scheduler

apiVersion: v1
kind: Pod
metadata:
  name: pod-with-custom-scheduler
spec:
  schedulerName: my-scheduler
  containers:
  - name: pod-with-custom-scheduler
    image: k8s.gcr.io/pause:2.0

Above pod will be scheduled by my-scheduler instead of default kube-scheduler. If you omit the schedulerName then it will be scheduled by default kube-scheduler

From the doc

By default, one profile with the scheduler name default-scheduler is created. This profile includes the default plugins described above. When declaring more than one profile, a unique scheduler name for each of them is required.

If a Pod doesn't specify a scheduler name, kube-apiserver will set it to default-scheduler. Therefore, a profile with this scheduler name should exist to get those pods scheduled

So you could just replace the existing kube scheduler with your scheduler with name default-scheduler.

  1. Replace the docker image of kube-scheduler with your image in /etc/kubernetes/manifests/kube-scheduler.yaml or
  2. Edit the kube scheduler deployment and change the image
3
Dashrath Mundkar On

First make sure you configure the custom scheduler and make sure you disable the --leader-elect=false . and You can use it in pod under spec section like this

spec: 
  containers:
  - image: nginx
    name: nginx
  schedulerName: your-scheduler-name
1
842Mono On

I also needed to replace the default Kubernetes scheduler with a custom one. Here's how I did it.

I think this is the main piece of the answer. I moved (or removed) the file etc/kubernetes/manifests/kube-scheduler.yaml. This disables (or removes) the default Kubernetes scheduler. Moreover, you can check that it gets removed by running kubectl get po -n kube-system | grep -i scheduler before and after removing the file.

Now that the default scheduler is disabled, I have a custom scheduler (a python script) that does the scheduling. I just run it. Here's the script below. It's not very clean, but it should work. You can tweak it as you wish. Note that I didn't try running the script after I cleaned it. Minor errors may exist.

#!/usr/bin/env python

import time
import random
import json

from kubernetes import client, config, watch

config.load_kube_config()
v1 = client.CoreV1Api()

def get_request_time(hostname):
    # You can do some magic here.
    print("returning 1.2")
    return 1.2

def best_request_time(nodes):
    if not nodes:
        return []
    node_times = [get_request_time(hostname) for hostname in nodes]
    best_node = nodes[node_times.index(min(node_times)) + 1]
    print("Best node: " + best_node)
    return best_node


def nodes_available():
    ready_nodes = []
    for n in v1.list_node().items:
            # This loops over the nodes available. n is the node. We are trying to schedule the pod on one of those nodes.
            for status in n.status.conditions:
                if status.status == "True" and status.type == "Ready":
                    ready_nodes.append(n.metadata.name)
    return ready_nodes


def scheduler(name, node, namespace="<YOUR-NAMESPACE-HERE"): # You can use "default" as a namespace.
    target=client.V1ObjectReference()
    target.kind="Node"
    target.apiVersion="v1"
    target.name= node
    meta=client.V1ObjectMeta()
    meta.name=name
    body=client.V1Binding(target=target)
    body.metadata=meta
    return v1.create_namespaced_binding(namespace, body, _preload_content=False)

def main():
    w = watch.Watch()
    for event in w.stream(v1.list_namespaced_pod, "<YOUR-NAMESPACE-HERE>"):
        # We get an "event" whenever a pod needs to be scheduled
        if event['object'].status.phase == "Pending": # and event['object'].spec.scheduler_name == scheduler_name:
            try:
                arg2 = best_request_time(nodes_available())
                print("Scheduling " + event['object'].metadata.name)
                res = scheduler(event['object'].metadata.name, arg2)
            except client.rest.ApiException as e:
                print("exception")
                print(json.loads(e.body)['message'])

if __name__ == '__main__':
    main()
0
jbg On

If you can't remove/replace the default scheduler in the control plane (e.g. if you are on a managed k8s platform), you can use GateKeeper — the replacement for OPA — or another policy agent to write mutations that are applied to some or all pods on your cluster.

For example:

apiVersion: mutations.gatekeeper.sh/v1beta1
kind: Assign
metadata:
  name: pod-scheduler-name
spec:
  applyTo:
  - groups: ""
    kinds: ["Pod"]
    versions: ["v1"]

  match:
    kinds:
    - apiGroups: ["*"]
      kinds: ["Pod"]

    # Adjust this to a label that is present on the pods of your custom scheduler.
    # It's important that you leave your custom scheduler to be itself scheduled by the
    # default scheduler, as otherwise if all pods of your custom scheduler somehow get
    # terminated, they won't be able to start up again due to not being scheduled.
    labelSelector:
      matchExpressions:
      - key: app
        operator: NotIn
        values: ["my-scheduler"]

  location: "spec.schedulerName"

  # Adjust this to match the desired profile name from your scheduler's configuration.
  parameters:
    assign:
      value: my-scheduler