Google Cloud Dataflow handling schema evolution (Addition of columns)

24 views Asked by At

I have a cloud Pub/Sub instance to which my dataflow is subscribed. each message includes a primary key column and another random column. I have to check whether that column exists in my bigquery table, if yes then we simply insert. If not, I have a method which will alter the table structure. The main issue is that I want to store the columns in a set object and cache it using shared class. So whenever a new column arises, the cache is updated and is in sync with all other workers. Any idea on how to update the cache, or something to start with?

0

There are 0 answers