Airflow - Distribute flow execution on multiple workers

29 views Asked by At

I have a DAG that looks like this:

with DAG(
    dag_id=DAG_ID,
    schedule_interval='@once',
    default_args=default_args,
    ) as dag:
    
    def function_1(value):
        return value+1
        
    def function_2(value):
        return value+2
        
    values_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    
    for value in values_list:
        execute_func_1 = PythonOperator(
            task_id=f"{str(value)}_pyth_op_1",
            python_callabale=function_1,
            op_kwargs={'value': value}
        )
        
        execute_func_2 = PythonOperator(
            task_id=f"{str(value)}_pyth_op_2",
            python_callabale=function_2,
            op_kwargs={'value': value}
        )
        
        execute_func_1 >> execute_func_2

It executes everything in parrallel just as intended, but on only one worker (out of 3) and it runs out of memory. If I increase the worker's memory it works fine.

Is there any way to distribute the execution on multiple workers?

1

There are 1 answers

0
Yusuf Quazi On

IIUC, if you are concerned about workers going OOM while executing the task and doesn't want to increase worker memory, then something that would be worth you time is [1] "WORKER_CONCURRENCY" with this override you can control how many active task can a worker handle at a give time.

for instance if it's set to default try reducing the number abit and see what works best for you.

[1] https://airflow.apache.org/docs/apache-airflow/1.10.9/configurations-ref.html#worker-concurrency