I have various DAGs scheduled, but especially one DAG at a certain run is not being triggered.
I am aware that Airflow runs a job at the end of the period, but surely I'm missing something.
I have a schedule defined as:
10 2,5,8,11,14,17,20,23 * * *, meaning my job should run everyday at 02.10, 05.10, 08.10, 11.10, 14.10, 17.10, 20.10, 23.10 UTC.
For some reason, 23.10 UTC is always skipped, and I don't understand why.
Airflow runs my 20.10 run, skips 23.10, and then continue with 02.10.
So my question is why this run is always skipped.
My default DAG arguments are as follows:
default_args = {
"owner": "whir",
"depends_on_past": False,
"start_date": days_ago(0, hour=0, minute=0, second=0, microsecond=0),
"email": [""],
"email_on_failure": False,
"email_on_retry": False,
"retries": 4,
"retry_delay": timedelta(minutes=30),
}
with DAG(
'transfer-data',
default_args=default_args,
description="Transfer data",
schedule_interval='10 2,5,8,11,14,17,20,23 * * *',
catchup=True
) as dag:
...
Ok my guess for why something's wrong here is that your
start_dateparameter should be in the DAG definition, not indefault_args. Move it out of your default args and instead add it into you DAG definition like:Airflow is very particular about DAG definitions as it can sometimes cause unexpected behavior in the metadata database on the backend.
start_dateis a parameter set at the DAG level - you're stating when the DAG should begin. You're not passing it to each individual tasks, which is whatdefault_argsshould be for.It's hard to tell just by looking at what you've given us, but my guess is that the start date gets reset around midnight, and that's why it's somehow working for every run other than the 23:10 one.