I am trying to create a Dataflow job run daily with Cloud Scheduler. I need to get the data from an external API using GET requests, so I need the current date as an input. However, when I export the dataflow job as a template for scheduling, the date input stays at execution time and not updated daily. I have been looking around for a solution, and come across the ValueProvider, but my pipeline, stating with apache_beam.transforms.Create
always return an error 'RuntimeValueProvider(option: test, type: str, default_value: 'killme').get() not called from a runtime context' when the ValueProvider is not specified.
Is there anyway I can overcome this? It seems like such a simple problem, yet I cannot make it work no matter how. I appreciate a lot if there is any idea!!
You can use the ValueProvider interface to pass runtime parameters to your pipeline, to access it within a DoFn you will need to pass it in as parameter. Similar to the following example from here:
https://beam.apache.org/documentation/patterns/pipeline-options/#retroactively-logging-runtime-parameters
You may also want to have a look at Flex templates :
https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates