Airflow DataProcPySparkOperator not considering cluster other than global region

983 views Asked by At

I am performing some operation using DataProcPySparkOperator. This operator is only taking a cluster name as parameter, there is no option to specify region and by default it considers cluster with global region. For clusters with regions other than global, the following error occurs:

googleapiclient.errors.HttpError: https://dataproc.googleapis.com/v1/projects//regions/global/jobs:submit?alt=json returned "No current cluster for project id '' with name ''`

Am i missing anything or its just limitation with these operators?

2

There are 2 answers

4
fenglu On

These DataProc{PySpark|Spark|Hive|Hadoop|..}Operators simply don't support region argument today, an airflow issue has been created and I'll submit a fix in the next few days.

0
Mark Goodwin On

We were running into the same issue using Google Composer, which was running Airflow 1.9. We upgrade to Airflow 1.10 and this fixed the issue. Google just released it. Now, when I run the operator it can see the cluster - it looks in the correct region. Previously it was always looking in global.