BigQuery bulk update existing table rows from json file after adding new columns to the table

2.2k views Asked by At

I have a bigQuery table which has around 2M rows which are loaded from a JSON file. Actual fields in JSON file are 10 but table has 7 columns as per initial DDL. Now I altered the table and added remaining three columns. After altering, the values in newly added columns are filled with NULL.

Now I want to backfill the data in existing 2M rows but for only those three newly added columns with actual data from json file. How can i bulk update the table so that existing column values remain untouched and only new column values are updated.

Note: Table has streaming buffer enabled and the table is NOT Partitioned.

1

There are 1 answers

1
Yun Zhang On BEST ANSWER

Now I want to backfill the data in existing 2M rows but for only those three newly added columns with actual data from json file.

Since loading data is free of charge, I'd reload the whole table with WRITE_TRUNCATE option to overwrite the existing data.

What you said confuses me because:

  • If your 2M rows in BQ table has the same data as what's in JSON file, why do you care whether they are touched or not?
  • If your 2M rows in BQ table has been altered in some way, how do you expect the rows in JSON file matches the altered data on a per row basis (to backfill the missing column)?

-- Update: based on the comment, it seems that the loaded rows has been altered in some way. Then:

  • For your existing data, if there is not a (logical) primary key for you to use to match the rows, then it is technically impossible to "match and update".
  • If your existing data do have a logical primary key, and you don't mind the cost, you could load the full table into a temporary table then use DML to backfill the missing columns.

For your future data loading, if you want the loading to be incremental (either on rows or on columns), better you could have your loaded table untouched so that it represents that 'full fact' and keep the 'altered rows' in a separate table, assuming you have a logical primary key to match them.