Could I achieve schema on read approach when loading/ingesting data to Google BQ (BigQuery)?

57 views Asked by At

So I am exploring of fully using BQ as my primary storage for Data Lakehouse architecture pattern.

However, one of the main features of the Data Lakehouse is that its** first layer** (raw, bronze - whatever it's called) is schema on-read.

Are there any approaches where I could use BQ for my RAW with a schema on-read approach?

Has anyone seen / done this? Is this completely stupid question? :)

For example, I am loading data from a RDBMS (mssql, oracle) via a BQ connector, and even if a column changes the data type or a  new column is added or column is removed - all works and data is ingested just fine in BQ. Meaning at this RAW stage I don't have to worry about managing schema evolution.

Thank you, DV

I am trying to build a data ingestion pattern with schema on read approach, but when sink is BQ. Currently, found no options on paper.

1

There are 1 answers

2
Gaurang Shah On

Bigquery allows you to have an external table on files stored in cloud storage. So you can use this option.

However, before you choose bigquery as your primary storage and query engine. please understand this two things.

  1. bigquery has two storage, logical - which is uncompressed storage and physical storage which is compressed storage. You can change your cost model for storage to move to physical storage. However your queries will still charge you based on logical storage.

We learned this the hard way. our data got tripped when we moved to bigquery and so does cost.