Airflow scheduler ends up stuck when restarting during long running job

1.6k views Asked by At

I have a an airflow scheduler started with the setting --run-duration 86400 (24 hours). After this time scheduler “dies” and will be automatically restarted. This work fine as long as there is no long running job (say 2 hours) that was started shortly before 24 hours were over. In such situation I get one scheduler process working on my long running task and all others as zombie (defunct). No other jobs will be processed for the duration of the long running job.

I’m working with LocalExecutor.

My question is:

  • Is it save to allow scheduler run indefinitely (without --run-duration or –num_runs)?
  • Does something similar happens with celery or dask executor?
1

There are 1 answers

0
Shlomo On

As I understand, when running jobs with LocalExecuter the actual task is ran by a sub-process of the scheduler, thus the scheduler has to wait until all of it's tasks finish before it can restart. During this time no new tasks will start.

Under CeleryExecuter, Celery is actually running the tasks so you shouldn't run into this issue.