Why is it recommended against using a dynamic start_date in Airflow?

26.9k views Asked by At

I've read Airflow's FAQ about "What's the deal with start_date?", but it still isn't clear to me why it is recommended against using dynamic start_date.

To my understanding, a DAG's execution_date is determined by the minimum start_date between all of the DAG's tasks, and subsequent DAG Runs are ran at the latest execution_date + schedule_interval.

If I set my DAG's default_args start_date to be for, say, yesterday at 20:00:00, with a schedule_interval of 1 day, how would that break or confuse the scheduler, if at all? If I understand correctly, the scheduler would trigger the DAG with an execution_date of yesterday at 20:00:00, and the next DAG Run would be scheduled for today at 20:00:00.

Is there some concept that I'm missing?

2

There are 2 answers

7
liferacer On

First run would be at start_date+schedule_interval. It doesn't run dag on start_date, it always runs on start_date+schedule_interval.

As they mentioned in document if you give start_date dynamic for e.g. datetime.now() and give some schedule_interval(1 hour), it will never execute that run as now() moves along with time and datetime.now()+ 1 hour is not possible

0
dgies On

The scheduler expects to see a constant start date and interval. If you change it the scheduler might not notice until it reloads the DagBag, and if the new start date doesn't line up with your old schedule it might break depends_on_past behavior.

If you don't need depends_on_past the simplest might be to stop using the scheduler, set the start date to some arbitrary old date, and externally trigger the DAG however you like using a crontab or similar.