How to design a distributed job scheduler?

30.6k views Asked by At

I want to design a job scheduler cluster, which contains several hosts to do cron job scheduling. For example, a job which needs run every 5 minutes is submitted to the cluster, the cluster should point out which host to fire next run, making sure:

  1. Disaster tolerance: if not all of the hosts are down, the job should be fired successfully.
  2. Validity: only one host to fire next job run.

Due to disaster tolerance, job cannot bind to a specific host. One way is all the hosts polling a DB table(certainly with lock), this guaranteed only one host gets the next job run. Since it often locks table, is there any better design?

6

There are 6 answers

0
Marco Haschka On

I did require something like this long ago, when synchronisation was done with floppy disks. You should be clear about three things, which seem to be simple, but in distributed environment the arent :-)

"Synchronisation Sections" If you get a net split, which means your cluster is split in two seperate sections wich can communicate inside the sections, but not between the two sections, the "fire the job exactly once" can only acquired per synchronisation section.

"Disaster" If almost all times all computers are up and running and only very seldom one fails, and the failure of two is almost unthinkable, its a completely different thing, than every host is running only part time, the connections are unstable, or the synchronisation is done by dial-up connections or by floppys. If you want even deal with a net split, it becomes really really complicated. If you want to deal with malicious hosts, you have another Problem.

"Validity" Fire every job exactly once... you have to synchronize faster than the job firing interval.

edit: Tipp for scheduler-tasks design. I have a big text file, wich contains lines. Every line is a job task, starting with job-type, then time to execute, then command and last but not least a optional resubmission-interval for repeating tasks. Syncing means merging. Executed tasks are deleted. If resubmission is on, then a new task is inserted or appended.

In an ideal world, every host ist allways connected to the others, I would implement something like a token ring. If there is no master, one is selected by the hosts, and the master is expected to schedule everything until he is not sending heardbeats for some time. If there are two masters, they negotiate for one of them to become master(maybe lower MAC-Adress... whatever).

If you have to deal with malicious hosts, you can use some byzantine gerenals-problem solution. The selection of the master is allready pretty good proofed against malicious hosts. With a little bit of rsa-krypto the selected master can signature every command, resend attacks can be treated with timestamps or growing indices... voila.

only as a story from an onld programmer, not intended for today everything is allways connected to the internet world: My big problem about 20 years ago was, that the hosts were synchronized from once a hour and once a day to once a week or once a month. So the solution was to have different commands: 1. execute on every host at a given date (wich is far enough in the future for synchronisation) 2. execute on a host, where "whoami" contains a certain substring. 3. execute on a random host with little probability, and send an acknowledgement to all others, that it is allready executed.

The third command-type does something like "fire only once", if the synchronisation is much faster than the probability of execution. It needs no master-slave architecture and it works pretty well, if you know the synchronisation intervalls.

3
Stefan On

Use the Quartz framework for that. It has a cron like syntax, can be clustered and only one of the hosts in the cluster will do one job at a time. If a host or job fails, another host will retry the pending job.

1
mvmn On

I'm not sure how to design one, but there are open-source products that do that which can serve as an example. One is Quartz scheduler that is mentioned above.

But, apparently, WallmartLabs have evaluated Quartz, found it to be not good enough, and thus created and open-sourced a better (in their opinion) alternative to it called BigBen. Perhaps you could also look at that one.

2
gnurik On

Check out Chronos (https://mesos.github.io/chronos/) which runs on top of Mesos - (https://mesos.apache.org/) resource scheduler.

2
Maxim Fateev On

Consider using AWS Simple Workflow Service if you are OK with using AWS web services. The benefit over something like Quartz is that it doesn't depend on database which you have to host and it can provide much more than scheduling. For example it can run some activities that fix your cluster or page you if scheduling is not possible for any reason. Here is an example of a cron workflow.

1
shcherbak On

I googled out the Dkron (Distributed job scheduling system). It has rest api and looks good. I plan try to use it Dkron site