Vertex workbench - how to run BigQueryExampleGen in Jupyter notebook

Question

Vertex workbench - how to run BigQueryExampleGen in Jupyter notebook

651 views Asked by mon At 12 June 2022 at 11:56

Problem

InvalidUserInputError: Request missing required parameter projectId [while running 'InputToRecord/QueryTable/ReadFromBigQuery/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/SplitAndSizeRestriction']

Steps

BigQueryExampleGen Setup the GCP project and the interactive TFX context.

import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "path_to_credential_file"


from tfx.v1.extensions.google_cloud_big_query import BigQueryExampleGen
from tfx.v1.components import (
    StatisticsGen,
    SchemaGen,
)
from tfx.orchestration.experimental.interactive.interactive_context import InteractiveContext
%load_ext tfx.orchestration.experimental.interactive.notebook_extensions.skip
context = InteractiveContext(pipeline_root='./data/artifacts')

Run the BigqueryExampleGen.

query = """
SELECT 
    * EXCEPT (trip_start_timestamp, ML_use)
FROM 
    {PROJECT_ID}.public_dataset.chicago_taxitrips_prep
""".format(PROJECT_ID=PROJECT_ID)

example_gen = context.run(
    BigQueryExampleGen(query=query)
)

Got the error.

InvalidUserInputError: Request missing required parameter projectId [while running 'InputToRecord/QueryTable/ReadFromBigQuery/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/SplitAndSizeRestriction']

Data

See mlops-with-vertex-ai/01-dataset-management.ipynb to setup the BigQuery dataset for CThe Chicago Taxi Trips dataset.

Original Q&A

There are 1 answers

**mon** · Answer 1 · 2022-06-12T11:56:07+00:00

Project ID

To run in GCP, need to provide the project ID via beam_pipeline_args argument.

Colab InteractiveContext Unable to Determine ProjectID for BQ #882

have proposed #888 to make this work. With that change, you would be able to do
context.run(..., beam_pipeline_args=['--project', 'my-project'])

query = """
SELECT 
    * EXCEPT (trip_start_timestamp, ML_use)
FROM 
    {PROJECT_ID}.public_dataset.chicago_taxitrips_prep
""".format(PROJECT_ID=PROJECT_ID)

example_gen = context.run(
    BigQueryExampleGen(query=query),
    beam_pipeline_args=[
        '--project', PROJECT_ID,
    ]
)

However, it still fails with another error.

ValueError: ReadFromBigQuery requires a GCS location to be provided. Neither gcs_location in the constructor nor the fallback option --temp_location is set. [while running 'InputToRecord/QueryTable/ReadFromBigQuery/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/SplitAndSizeRestriction']

GCS Bucket

It looks inside GCP, the interactive context runs the BigQueryExampleGen via Dataflow, hence need to provide a GCS bucket URL via the beam_pipeline_args argument.

When running your Dataflow pipeline pass the argument --temp_location gs://bucket/subfolder/

query = """
SELECT 
    * EXCEPT (trip_start_timestamp, ML_use)
FROM 
    {PROJECT_ID}.public_dataset.chicago_taxitrips_prep
""".format(PROJECT_ID=PROJECT_ID)

example_gen = context.run(
    BigQueryExampleGen(query=query),
    beam_pipeline_args=[
        '--project', PROJECT_ID,
        '--temp_location', BUCKET
    ]
)

statistics_gen = context.run(
    StatisticsGen(examples=example_gen.component.outputs['examples'])
)
context.show(statistics_gen.component.outputs['statistics'])

schema_gen = SchemaGen(
    statistics=statistics_gen.component.outputs['statistics'],
    infer_feature_shape=True
)
context.run(schema_gen)
context.show(schema_gen.outputs['schema'])

Documentation

Reading data from BigQuery with TFX and Vertex Pipelines

This notebook-based tutorial will use Google Cloud BigQuery as a data source to train an ML model. The ML pipeline will be constructed using TFX and run on Google Cloud Vertex Pipelines. In this tutorial, we will use the BigQueryExampleGen component which reads data from BigQuery to TFX pipelines.

We also need to pass beam_pipeline_args for the BigQueryExampleGen. It includes configs like the name of the GCP project and the temporary storage for the BigQuery execution.

TechQA.

Vertex workbench - how to run BigQueryExampleGen in Jupyter notebook

Problem

Steps

Data

There are 1 answers

Project ID

GCS Bucket

Documentation

Related Questions in GOOGLE-BIGQUERY

Related Questions in GOOGLE-CLOUD-VERTEX-AI

Related Questions in TFX

Related Questions in TENSORFLOW-EXTENDED

Popular Questions

Popular Tags

Trending Questions