Kubeflow Pipeline Error: 'Error (exit code 1): no such file or directory for saved_model.pb.tgz

210 views Asked by At

I'm working on a Kubeflow pipeline and encountering an error. The error message is as follows: "This step is in Error state with this message: Error (exit code 1): open /var/run/argo/outputs/artifacts/app/results_FC_shift_cycle/model/saved_model.pb.tgz: no such file or directory."

I'm trying to run a Kubeflow pipeline, and this specific step seems to be failing. The pipeline is supposed to generate a saved model, but it's failing at this point.

I'm using the following code for this step:

import kfp
import kfp.dsl as dsl
from utils.utils import upload_pipe,compile
import argparse
parser = argparse.ArgumentParser(description='getting gitlab ci pipeline arguments')
parser.add_argument('--build_id', help='build id of pipeline')
parser.add_argument('--commit_id', help='build id of pipeline')
parser.add_argument('--build_url', help='build id of pipeline')
args = parser.parse_args()
build_id = args.build_id
commit_id = args.commit_id
build_url = args.build_url

# Define the pipeline function and its inputs and outputs
def shift_cycle_pipeline():
    # Create a Docker container with the necessary dependencies and the shift_cycle_kfp.py script
    return dsl.ContainerOp(
        name='shift-cycle',
        image=f'europe-west9-docker.pkg.dev/project_id/cycle:{build_id}',
        command=['python', '/app/main.py'],
        file_outputs={
            'output_mlmodel': '/app/results_FC_shift_cycle/model/saved_model.pb',
        }
    )
@dsl.pipeline(
    name="Shift Cycle Training Pipeline",
    description="A pipeline that uses a custom Docker image for Model Training."
    )
def shift_cycle():
    train_task = shift_cycle_pipeline()
    train_task.container.set_image_pull_policy("Always")
     
pipeline_file = "shift_cycle.yaml"
pipeline_name = 'shift_cycle'  # Specify a name for your pipeline
experiment_name = 'shift_cycle'
namespace = "kubeflow-user-example-com"
# Compile the pipeline
compile(shift_cycle, pipeline_file)
# Run pipeline
pipe_desc = f'This is Gitlab Ci Training pipeline infos:\nCommit ID is: {commit_id}\nBuild ID is: {build_id}\nPipeline Build URL is: {build_url}'
upload_pipe(pipeline_file, pipeline_name, experiment_name, namespace,pipeline_version_name=build_id,pipe_desc=pipe_desc, run=True)

enter image description here

I've run the image docker manually and it's working without any problems

the main script include this step :

def upload_model():
    print('UPLOAD MODEL TO GCP......')
    gcp_json = '/app/service_account_key.json'
    bucket_name = 'bucketname'
    filename = 'results_FC_shift_cycle/model/saved_model.pb'
    # check if the model exist 'results_FC_shift_cycle/model/saved_model.pb'
    #if os.path.exists('results_FC_shift_cycle/model/saved_model.pb'):
    print('Model exist')
    client = storage.Client.from_service_account_json(gcp_json)
    bucket = client.get_bucket(bucket_name)
    blob = bucket.blob(filename)
    blob.upload_from_filename('/app/results_FC_shift_cycle/model/saved_model.pb')
    print('Model uploaded to GCP')

This is the Dockerfile I use to build the image:

FROM python:3.9-slim-buster

# Set the working directory to /app
WORKDIR /app

# Copy the current directory contents into the container at /app
COPY . /app


# Create a virtual environment and activate it
RUN python -m venv venv
ENV PATH="/app/venv/bin:$PATH"

# Install required packages
RUN pip install --no-cache-dir --trusted-host pypi.python.org -r requirements.txt

# Run the script
CMD ["python", "main.py"]

After the training step, the script should run a function to upload the model to Google Cloud Storage (GCS), but this is not happening.

I expect the pipeline to generate the 'saved_model.pb.tgz' file at the specified path and upload it to Google Cloud Storage.

0

There are 0 answers