Using elements in a PCollection as table arguments for ReadFromBigQuery BigQuery IO

50 views Asked by At

I start working with Apache Beam and Dataflow using Python SDK recently.

I've developed a custom DoFn to return a PCollection of a list of BigQuery table names. I want a way to iterate those table names so that I can do further steps on each table. It could look like this:

with beam.Pipeline(options=beam_options) as pipeline:
        table_ids = (
                pipeline
                | f"InitializeTransferProcess" >> beam.Create([""])
                | f"TransferBigQueryData" >> beam.ParDo(CustomerBigQueryDataTransferFn())
            )

        for table_id in table_ids:
            pipeline | f"ReadTable[{table_id}]" >> beam.io.ReadFromBigQuery(table=table_id, method=BQ_READ_METHOD) | f"ManipulateTable[{table_id}]" >> beam.ParDo(DoOtherStuffFn())

Is this possible to achieve? I tried some ways but nothing works.

  • I've tried to create another ParDo and passed table_id as well as the pipeline instance into it but it seems that Beam does not allow to do that.

  • I've tried to create an empty PCollection as the initial input of ReadFromBigQuery transform, but it just does not output anything.

Thank you all!

0

There are 0 answers