Are there plans to enable Cloud Dataflow to write data to Cloud Bigtable? Is it even possible?
Adding a custom Sink
to handle the IO would probably be the clean choice.
As a workaround, I tried connecting to a Bigtable (same project) in a simple DoFn
. Opening the connection and table in the startBundle
step and closing them in finishBundle
.
Moreover, I added the bigtable-hbase jar (0.1.5)
to the classpath and a modified version of hbase-site.xml
to the resource folder which gets picked up.
When running in the cloud, I get a NPN/ALPN extensions not installed
exception.
When running locally, I get an exception stating that ComputeEngineCredentials cannot find the metadata server.
despite having set the GOOGLE_APPLICATION_CREDENTIALS
to the generated json key file.
Any help would be greatly appreciated.
Cloud BigTable requires the NPN/ALPN networking jar. This is currently not installed on the Dataflow workers. So accessing Cloud BigTable directly from a ParDo won't work.
One possible work around is to use the HBase REST API to setup a REST server to access Cloud Bigtable on a VM outside of Dataflow. These instructions might help.
You could then issue REST requests to this REST server. This could be somewhat complicated if your sending a lot of requests (i.e. processing large amounts of data and need to set up multiple instances of your REST server and load balance across them).