Support for Cloud Bigtable as Sink in Cloud Dataflow

625 views Asked by At

Are there plans to enable Cloud Dataflow to write data to Cloud Bigtable? Is it even possible?

Adding a custom Sink to handle the IO would probably be the clean choice.

As a workaround, I tried connecting to a Bigtable (same project) in a simple DoFn. Opening the connection and table in the startBundle step and closing them in finishBundle.

Moreover, I added the bigtable-hbase jar (0.1.5) to the classpath and a modified version of hbase-site.xml to the resource folder which gets picked up.

When running in the cloud, I get a NPN/ALPN extensions not installed exception.

When running locally, I get an exception stating that ComputeEngineCredentials cannot find the metadata server. despite having set the GOOGLE_APPLICATION_CREDENTIALS to the generated json key file.

Any help would be greatly appreciated.

2

There are 2 answers

1
Jeremy Lewi On BEST ANSWER

Cloud BigTable requires the NPN/ALPN networking jar. This is currently not installed on the Dataflow workers. So accessing Cloud BigTable directly from a ParDo won't work.

One possible work around is to use the HBase REST API to setup a REST server to access Cloud Bigtable on a VM outside of Dataflow. These instructions might help.

You could then issue REST requests to this REST server. This could be somewhat complicated if your sending a lot of requests (i.e. processing large amounts of data and need to set up multiple instances of your REST server and load balance across them).

0
Solomon Duskis On

We now have a Cloud Bigtable / Dataflow connector. You can see more at: https://cloud.google.com/bigtable/docs/dataflow-hbase