We have a ETL workflow as -
Check file > Create dataproc cluster > Perform ETL Operation > Delete Cluster
In order to save resources, we skip all steps if there are no file to process.
But if there are files to process, we then create dataproc cluster and even if ETL operation fails we delete cluster to save resources cost.
We are using Airflow 2.4 version so we cannot use as_setup() and as_tear_down() (available in Airflow 2.7)
Here we need dataproc delete cluster step to perform dual operation -
- skip, if no files to process.
- delete resources in case of ETL failure.
If you can't use new versions of Airflow, that mean also that you can't use latest providers, you could:
AirflowSkipExceptionin upstream taskAirflowSkipExceptioninpre_executecallback, see available parameters into the BaseOperator (still experimental function)ShortCircuitOperator