Programatically deploying and running beam pipelines on GCP Dataflow

102 views Asked by At

i'm trying to programatically deploy some beam pipelines on GCP dataflow using google-cloud-dataflow but am unsure how this can be done.

These pipelines are already packaged as jars and my goal is to use the google-cloud-dataflow SDK to help deploy and start them in GCP dataflow.

There doesn't seem to be any methods in JobsV1Beta3Client or TemplatesServiceClient to specify the paths to these jar or pass in pipeline options.

I've seen some samples here but am still not getting it. https://simplesassim.wordpress.com/2022/07/12/how-to-create-a-job-in-google-dataflow/ https://simplesassim.wordpress.com/2022/07/12/how-to-start-a-job-in-google-dataflow/

These pipelines can be executed locally or via dataflow using https://cloud.google.com/dataflow/docs/guides/deploying-a-pipeline#run-on-dataflow but i'm looking at a solution that utilises the google-cloud-dataflow sdk.

There's also REST apis at https://cloud.google.com/dataflow/docs/reference/rest but google recommends to do it using the client library.

Does anyone have a clue how this can be done or am i approaching this the wrong way?

1

There are 1 answers

0
Dhiraj Singh On

As commented by @毛三王, to deploy and run Apache Beam pipelines in GCP DataFlow, you can try using TemplatesServiceClient.launchTemplate to run the pipeline in dataflow.

Posting the answer as community wiki for the benefit of the community that might encounter this use case in the future. Feel free to edit this answer for additional information.