How to find if a specific column is used across views in BigQuery (when the columns are not explicitly written in the view DDL)?

16 views Asked by At

Let's say I have a table dataset.table1 with columns col1 and col2 in BigQuery. I create a view dataset.view1 with DDL 'select * from dataset.table1'.

Can I use SQL or a python library, or any other way (without using data lineage or any other additional paid functionality) to find that col1 and col2 from dataset.table1 are used in dataset.view1 ?

What about if I create a new view dataset.view2 with DDL 'select * from dataset.view1' ? Is it possible to track down that col1 and col2 from dataset.table1 are used in dataset.view2 ?

I know I can find where specific columns are used in views if the columns are explicity stated (view's DDL is select col1, col2 from dataset.table1) in the INFORMATION_SCHEMA. But I wanted to know if I can find where table columns are used in views if not explicitly stated.

1

There are 1 answers

0
Samuel On

As you wrote, data lineage is the right tool for this. I do not know of any other tool doing this task.

A workarount would be to obtail all view queries from INFORMATION_SCHEMA.VIEWS. Then parse to find your dataset.table1 and replace it with (Select * except(col1) from your dataset.table1). Run this as a dry run. If it fails, the col1 is needed in this view.

But think of following view:

SELECT col2 FROM 
(
Select col1 || "add" as test_column, * 
FROM dataset.table1
)

Clearly this view only gives out col2, but there are calculations involving col1, which are not presented in the final output. Doing above replacement, the dry query will fail and leading to the wrong conclusion that col1 is used in the view query.