Cannot use dry-run parameter with big_query hook

376 views Asked by At

Bigquery query job provides a choice to dry-run the query before actually running it. It helps in having an idea of the billing and how much data is going to be processed in BigQuery. Below is the snippet of the QueryRequest:

{
  "kind": string,
  "query": string,
  "maxResults": integer,
  "defaultDataset": {
    object (DatasetReference)
  },
  "timeoutMs": integer,
  "dryRun": boolean,
  "preserveNulls": boolean,
   ...

I am trying to use the dry-run parameter with the help of BigQueryHook inside google-cloud-composer, but am not having any luck. Below is my code snippet :

def execute_sql(**kwargs):

    bq_hook = BigQueryHook(bigquery_conn_id='bigquery_default')
    bq_conn = bq_hook.get_conn()
    bq_cursor = bq_conn.cursor()

    #bq_cursor = BigQueryConnection(**kwargs).cursor()
    dryrun_sql = "select * from `{project}.{dataset}.{table}` where utcdate_='2021-01-01'"

    output = bq_cursor.run_with_configuration({'query':{"query": dryrun_sql,'useQueryCache':False,'useLegacySql':False,'dryRun':True}})

    utils.format_logging("job info: {}, Bytes processed: ".format(output))

    return None

The output returns a bigquery job_id. I am using the job_id to hit the BigQuery jobs.get api to analyze the query results. And there I can see that I was billed for the query, so I can deduce that dry-run parameter did not actually run.

Can anybody help with how to use the dry-run parameter inside big_query hook or provide an alternative solution???

0

There are 0 answers