Problem
Is there any way in Airflow to create a workflow such that the number of tasks B.* is unknown until completion of Task A? I have looked at subdags but it looks like it can only work with a static set of tasks that have to be determined at Dag creation.
Would dag triggers work? And if so could you please provide an example.
I have an issue where it is impossible to know the number of task B's that will be needed to calculate Task C until Task A has been completed. Each Task B.* will take several hours to compute and cannot be combined.
|---> Task B.1 --|
|---> Task B.2 --|
Task A ------|---> Task B.3 --|-----> Task C
| .... |
|---> Task B.N --|
Idea #1
I don't like this solution because I have to create a blocking ExternalTaskSensor and all the Task B.* will take between 2-24 hours to complete. So I do not consider this a viable solution. Surely there is an easier way? Or was Airflow not designed for this?
Dag 1
Task A -> TriggerDagRunOperator(Dag 2) -> ExternalTaskSensor(Dag 2, Task Dummy B) -> Task C
Dag 2 (Dynamically created DAG though python_callable in TriggerDagrunOperator)
|-- Task B.1 --|
|-- Task B.2 --|
Task Dummy A --|-- Task B.3 --|-----> Task Dummy B
| .... |
|-- Task B.N --|
Edit 1:
As of now this question still does not have a great answer. I have been contacted by several people looking for a solution.
Depending on the context, this could be implemented in an asynchronous batch workers style. "Dynamic tasks" can be treated as list of work items to be done and split into asynchronous messages published into external message broker queue for worker nodes to pick up.
One tasks generates "work" dynamically and publishes all items (we don't know in advance how many and even which exactly) into a topic/queue.
Workers consume "work tasks" from the queue. Either directly if implemented using external to Airflow technology, or as as Airflow Sensor task (maybe in a separate DAG). When they finish processing their task, the Airflow Sensor gets triggered and the execution flow continues.
To restore the flow for individual work items, think about using EIP Claim Check pattern.