Problem
Tried to run BigQueryExampleGen in the
InvalidUserInputError: Request missing required parameter projectId [while running 'InputToRecord/QueryTable/ReadFromBigQuery/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/SplitAndSizeRestriction']
Steps
BigQueryExampleGen Setup the GCP project and the interactive TFX context.
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "path_to_credential_file"
from tfx.v1.extensions.google_cloud_big_query import BigQueryExampleGen
from tfx.v1.components import (
    StatisticsGen,
    SchemaGen,
)
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext
%load_ext tfx.orchestration.experimental.interactive.notebook_extensions.skip
context = InteractiveContext(pipeline_root='./data/artifacts')
Run the BigqueryExampleGen.
query = """
SELECT 
    * EXCEPT (trip_start_timestamp, ML_use)
FROM 
    {PROJECT_ID}.public_dataset.chicago_taxitrips_prep
""".format(PROJECT_ID=PROJECT_ID)
example_gen = context.run(
    BigQueryExampleGen(query=query)
)
Got the error.
InvalidUserInputError: Request missing required parameter projectId [while running 'InputToRecord/QueryTable/ReadFromBigQuery/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/SplitAndSizeRestriction']
Data
See mlops-with-vertex-ai/01-dataset-management.ipynb to setup the BigQuery dataset for CThe Chicago Taxi Trips dataset.
 
                        
Project ID
To run in GCP, need to provide the project ID via
beam_pipeline_argsargument.However, it still fails with another error.
GCS Bucket
It looks inside GCP, the interactive context runs the BigQueryExampleGen via Dataflow, hence need to provide a GCS bucket URL via the
beam_pipeline_argsargument.Documentation