Using schema update option in beam.io.writetobigquery

793 views Asked by At

I am loading a bunch log files into BigQuery using apache beam data flow. The file format can change over a period of time by adding new columns to the files. I see Schema Update Option ALLOW_FILED_ADDITION.

Anyone know how to use it? This is how my WriteToBQ step looks:

| 'write to bigquery' >> beam.io.WriteToBigQuery('project:datasetId.tableId', ,write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND)
1

There are 1 answers

1
Judah Rand On

I haven't actually tried this yet but digging into the documentation, it seems you are able to pass whatever configuration you like to the BigQuery Load Job using additional_bq_parameters. In this case it might look something like:

| 'write to bigquery' >> beam.io.WriteToBigQuery(
    'project:datasetId.tableId',
    write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND,
    additional_bq_parameters={
        'schemaUpdateOptions': [
            'ALLOW_FIELD_ADDITION',
            'ALLOW_FIELD_RELAXATION',
        ]
    }
)

Weirdly, this is actually in the Java SDK but doesn't seem to have made its way to the Python SDK.