Let's say I have a table dataset.table1 with columns col1 and col2 in BigQuery. I create a view dataset.view1 with DDL 'select * from dataset.table1'.
Can I use SQL or a python library, or any other way (without using data lineage or any other additional paid functionality) to find that col1 and col2 from dataset.table1 are used in dataset.view1 ?
What about if I create a new view dataset.view2 with DDL 'select * from dataset.view1' ? Is it possible to track down that col1 and col2 from dataset.table1 are used in dataset.view2 ?
I know I can find where specific columns are used in views if the columns are explicity stated (view's DDL is select col1, col2 from dataset.table1) in the INFORMATION_SCHEMA. But I wanted to know if I can find where table columns are used in views if not explicitly stated.
As you wrote, data lineage is the right tool for this. I do not know of any other tool doing this task.
A workarount would be to obtail all view queries from
INFORMATION_SCHEMA.VIEWS. Then parse to find yourdataset.table1and replace it with(Select * except(col1) from your dataset.table1). Run this as a dry run. If it fails, the col1 is needed in this view.But think of following view:
Clearly this view only gives out
col2, but there are calculations involvingcol1, which are not presented in the final output. Doing above replacement, the dry query will fail and leading to the wrong conclusion thatcol1is used in the view query.