AWS Sagemaker error when deploying pre-trained PyTorch model: "%s already exists"

23 views Asked by At

Following this example: https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-script-mode/pytorch_bert/deploy_bert_outputs.html

I am running into an error deploying my pre-trained model.

I have the following notebook:

import os

model_path = "model/"
code_path = "code/"

import tarfile

zipped_model_path = os.path.join(model_path, "model.tar.gz")

with tarfile.open(zipped_model_path, "w:gz") as tar:
    tar.add(model_path)
    tar.add(code_path)

from sagemaker.pytorch import PyTorchModel
from sagemaker import get_execution_role
import time

endpoint_name = "yolo-" + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())

model = PyTorchModel(
    entry_point="inference_code.py",
    model_data=zipped_model_path,
    role=get_execution_role(),
    framework_version="1.5",
    py_version="py3",
)

predictor = model.deploy(
    initial_instance_count=1, instance_type="ml.t2.medium", endpoint_name=endpoint_name
)

And on the model.deploy call, I encounter the error: UnexpectedStatusException: Error hosting endpoint yolo-2024-03-24-21-49-05: Failed. Reason: The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint..

In the CloudWatch logs, I see: ERROR - %s already exists. Please specify --force/-f option to overwrite the model archive output file. See -h/--help for more details./.sagemaker/mms/models/model

What does this error mean, and how can I fix it? I don't see where I could put a -f flag for example.

0

There are 0 answers