My mind is confused over the architecture over Apache Airflow.
If I know, When you executed a hql or sqoop statement in oozie, oozie is directing the request to the data nodes.
I want to achieve same thing in Apache Airflow. I want to execute a shell script, hql or sqoop command, and I wanna be sure that my command is being executed distributely by data nodes. Airflow have different executor types. What should I do in order to run commands in different data nodes concurrently?
It seems you want to execute your tasks on distributed workers. In that case, consider using
CeleryExecutor
.See: https://airflow.apache.org/configuration.html#scaling-out-with-celery