Delete Dataproc Cluster in Airflow only if exist

27 views Asked by At

We have a ETL workflow as -

Check file > Create dataproc cluster > Perform ETL Operation > Delete Cluster

In order to save resources, we skip all steps if there are no file to process.

But if there are files to process, we then create dataproc cluster and even if ETL operation fails we delete cluster to save resources cost.

We are using Airflow 2.4 version so we cannot use as_setup() and as_tear_down() (available in Airflow 2.7)

Here we need dataproc delete cluster step to perform dual operation -

  1. skip, if no files to process.
  2. delete resources in case of ETL failure.
1

There are 1 answers

0
Andrey Anshin On

If you can't use new versions of Airflow, that mean also that you can't use latest providers, you could:

  1. Create logic around Branch Operator
  2. Raise AirflowSkipException in upstream task
  3. Raise AirflowSkipException in pre_execute callback, see available parameters into the BaseOperator (still experimental function)
  4. Use ShortCircuitOperator