BigQueryInsertJobOperator sometimes fails without logging Job Status

82 views Asked by At

I have some bigquery tasks that follow this structure for the BigQueryInsertJobOperator:

for phase in phases:

    trf_tasks[phase] = []

    for entity in phases[phase]['entities']:

        trf_task = BigQueryInsertJobOperator(
            task_id=f'{phase}_{entity}'
            , project_id=gcp_project
            , location='US'
            , configuration={
                "jobType": "QUERY",
                "query": {
                    "query": phases[phase]['transformations'][entity],
                    "destinationTable": {
                        "projectId": gcp_project,
                        "datasetId": phases[phase]['dataset'],
                        "tableId": phases[phase]['target_table_ids'][entity]
                    },
                    "createDisposition": 'CREATE_IF_NEEDED',
                    "writeDisposition": 'WRITE_TRUNCATE',
                    "schemaUpdateOptions": ['ALLOW_FIELD_ADDITION'],
                    "timePartitioning": {"type": 'DAY'},
                    "allowLargeResults": True,
                    "useLegacySql": False,
                },
                "jobTimeoutMs": 21600000,
            }
        )

        trf_tasks[phase].append(trf_task)

From time to time, everything works as expected. But sometimes some tasks (usually the ones that take longer) the last thing in the log is "Inserting job airflow_***" and the task fails repeatedly up to the max retries (currently 5, which is a lot), but when I check the Bigquery job indicated in the log, they usually complete successfully even if they take time to complete.

I tried to set tthe jobTimeoutMs to the max value allowed by Bigquery (6 hours), but this seems to be an issue from Airflow side.

I'm a little bit lost here about what should be needed from Airflow side, would it be that some adjustments in the airflow settings are needed? Or something in my code would make the airflow task to break apart after some time?

0

There are 0 answers