I start working with Apache Beam and Dataflow using Python SDK recently.
I've developed a custom DoFn to return a PCollection of a list of BigQuery table names. I want a way to iterate those table names so that I can do further steps on each table. It could look like this:
with beam.Pipeline(options=beam_options) as pipeline:
table_ids = (
pipeline
| f"InitializeTransferProcess" >> beam.Create([""])
| f"TransferBigQueryData" >> beam.ParDo(CustomerBigQueryDataTransferFn())
)
for table_id in table_ids:
pipeline | f"ReadTable[{table_id}]" >> beam.io.ReadFromBigQuery(table=table_id, method=BQ_READ_METHOD) | f"ManipulateTable[{table_id}]" >> beam.ParDo(DoOtherStuffFn())
Is this possible to achieve? I tried some ways but nothing works.
I've tried to create another ParDo and passed table_id as well as the pipeline instance into it but it seems that Beam does not allow to do that.
I've tried to create an empty PCollection as the initial input of ReadFromBigQuery transform, but it just does not output anything.
Thank you all!