What is the best practice for accessing Bigtable from streaming dataflow

887 views Asked by At

I need to access the bigtable in one of the transformation from streaming dataflow job.As per my knowledge there are two ways :

1) we can create the conneciton to bigtable from startBundle method of DoFn and access data from bigtable in processElement method.In this approach dataflow sdk create the new connection to Bigtable every time new element come in stream.

2) Create the bigtable connection at the time transformation obj creation and use that in processElement method, but dataflow sdk creates the obj, serialize it and recreate it in worker node, so Is the connection still active in worker node? or In streaming mode is it good to have open bigtable connection for longer period?

Or is there any another efficient way to achieve this.

Thanks.

1

There are 1 answers

0
Solomon Duskis On BEST ANSWER

AbstractCloudBigtableTableDoFn maintains the connection in the most optimal way we could think of, which is essentially a singleton per VM. It has a getConnection() method which will allow you to access a Connection in a managed way.

FWIW, the class is in the bigtable-hbase-dataflow project and not the DataflowSDK.