- I am trying to run in GCP a simple TFX pipeline using KubeflowRunner according to this tutorial: https://www.tensorflow.org/tfx/tutorials/tfx/gcp/vertex_pipelines_bq
- I also tried the same code using LocalDagRunner.
The best scenario is to make it work with KubeflowRunner.
I have the following TF/KF versions:
TensorFlow version: 2.11.0
TFX version: 1.12.0
KFP version: 1.8.22
I tried to upgrade the TFX to version 1.13.0 in my GCP notebook but I get the error: "No matching distribution found for tfx==1.13.0". I can see the latest available is 1.12.0
When trying to running the pipeline using Kubeflow:
BIG_QUERY_WITH_DIRECT_RUNNER_BEAM_PIPELINE_ARGS = [
'--project=' + GOOGLE_CLOUD_PROJECT,
'--temp_location=' + os.path.join('gs://', GCS_BUCKET_NAME, 'tmp'),
]
PIPELINE_DEFINITION_FILE = PIPELINE_NAME + '_pipeline.json'
runner = tfx.orchestration.experimental.KubeflowV2DagRunner(
config=tfx.orchestration.experimental.KubeflowV2DagRunnerConfig(),
output_filename=PIPELINE_DEFINITION_FILE)
_ = runner.run(
_create_pipeline(
pipeline_name=PIPELINE_NAME,
pipeline_root=PIPELINE_ROOT,
query=QUERY,
module_file=os.path.join(MODULE_ROOT, _trainer_module_file),
serving_model_dir=SERVING_MODEL_DIR,
beam_pipeline_args=BIG_QUERY_WITH_DIRECT_RUNNER_BEAM_PIPELINE_ARGS))
I get the error "module 'tfx.v1.orchestration.experimental' has no attribute 'KubeflowV2DagRunner":
AttributeError Traceback (most recent call last)
/var/tmp/ipykernel_34085/2.....py in <module>
11 PIPELINE_DEFINITION_FILE = PIPELINE_NAME + '_pipeline.json'
12
---> 13 runner = tfx.orchestration.experimental.KubeflowV2DagRunner(
14 config=tfx.orchestration.experimental.KubeflowV2DagRunnerConfig(),
15 output_filename=PIPELINE_DEFINITION_FILE)
AttributeError: module 'tfx.v1.orchestration.experimental' has no attribute 'KubeflowV2DagRunner'
When trying to run using LocalDagRunner:
BIG_QUERY_WITH_DIRECT_RUNNER_BEAM_PIPELINE_ARGS = [
'--project=' + GOOGLE_CLOUD_PROJECT,
'--temp_location=' + os.path.join('gs://', GCS_BUCKET_NAME, 'tmp'),
]
PIPELINE_DEFINITION_FILE = PIPELINE_NAME + '_pipeline.json'
runner = tfx.orchestration.LocalDagRunner(
# config=tfx.orchestration.experimental.KubeflowV2DagRunnerConfig(),
# output_filename=PIPELINE_DEFINITION_FILE
)
_ = runner.run(
_create_pipeline(
pipeline_name=PIPELINE_NAME,
pipeline_root=PIPELINE_ROOT,
query=QUERY,
module_file=os.path.join(MODULE_ROOT, _trainer_module_file),
serving_model_dir=SERVING_MODEL_DIR,
beam_pipeline_args=BIG_QUERY_WITH_DIRECT_RUNNER_BEAM_PIPELINE_ARGS))
I get error "getattr(): attribute name must be string":
WARNING:absl:metadata_connection_config is not provided by IR.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/var/tmp/ipykernel_34085/4.....py in <module>
17 module_file=os.path.join(MODULE_ROOT, _trainer_module_file),
18 serving_model_dir=SERVING_MODEL_DIR,
---> 19 beam_pipeline_args=BIG_QUERY_WITH_DIRECT_RUNNER_BEAM_PIPELINE_ARGS))
~/tfx_env/lib/python3.7/site-packages/tfx/orchestration/portable/tfx_runner.py in run(self, pipeline, run_options, **kwargs)
122 else:
123 run_options_pb = None
--> 124 return self.run_with_ir(pipeline_pb, run_options=run_options_pb, **kwargs)
~/tfx_env/lib/python3.7/site-packages/tfx/orchestration/local/local_dag_runner.py in run_with_ir(self, pipeline, run_options)
64 deployment_config.metadata_connection_config,
65 deployment_config.metadata_connection_config.WhichOneof(
---> 66 'connection_config'))
67
68 logging.info('Using deployment config:\n %s', deployment_config)
TypeError: getattr(): attribute name must be string