Using elements in a PCollection as table arguments for ReadFromBigQuery BigQuery IO

50 views Asked by Otter At 27 February 2024 at 23:46

I start working with Apache Beam and Dataflow using Python SDK recently.

I've developed a custom DoFn to return a PCollection of a list of BigQuery table names. I want a way to iterate those table names so that I can do further steps on each table. It could look like this:

with beam.Pipeline(options=beam_options) as pipeline:
        table_ids = (
                pipeline
                | f"InitializeTransferProcess" >> beam.Create([""])
                | f"TransferBigQueryData" >> beam.ParDo(CustomerBigQueryDataTransferFn())
            )

        for table_id in table_ids:
            pipeline | f"ReadTable[{table_id}]" >> beam.io.ReadFromBigQuery(table=table_id, method=BQ_READ_METHOD) | f"ManipulateTable[{table_id}]" >> beam.ParDo(DoOtherStuffFn())

Is this possible to achieve? I tried some ways but nothing works.

I've tried to create another ParDo and passed table_id as well as the pipeline instance into it but it seems that Beam does not allow to do that.
I've tried to create an empty PCollection as the initial input of ReadFromBigQuery transform, but it just does not output anything.

Thank you all!

Original Q&A

TechQA.

Using elements in a PCollection as table arguments for ReadFromBigQuery BigQuery IO

There are 0 answers

Related Questions in PYTHON

Related Questions in GOOGLE-BIGQUERY

Related Questions in APACHE-BEAM

Related Questions in APACHE-BEAM-IO

Popular Questions

Trending Questions