MlflowException: Unable to download model artifacts in Databricks while registering model with MLflow

178 views Asked by At

I am attempting to log, register, and deploy a finetuned GPT2 model in Databricks. While I have been able to get my logging code to run, when I try to run my registration code, I get an MlflowException error.

Here is my model logging code.

mlflow.set_registry_uri("databricks-uc")

with mlflow.start_run() as run:
    mlflow.transformers.log_model(
        transformers_model=pipeline,
        artifact_path="gpt2",
        registered_model_name=registered_model_name,
        input_example=input_example, 
        signature=signature,
        task="text-generation",
        inference_config = inference_config,
        await_registration_for=60 * 60,
    )

And here is my registration code:

mlflow.set_registry_uri("databricks-uc")
mlflow.set_tracking_uri("databricks")

result = mlflow.register_model(
    model_uri="runs:/"+run.info.run_id+"/model",
    name=registered_name,
    await_registration_for=1000,
)

Here is the full traceback, lightly edited.

MlflowException                           Traceback (most recent call last)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/store/_unity_catalog/registry/rest_store.py:483, in UcModelRegistryStore._local_model_dir(self, source, local_model_path)
    482 try:
--> 483     local_model_dir = mlflow.artifacts.download_artifacts(
    484         artifact_uri=source, tracking_uri=self.tracking_uri
    485     )
    486 except Exception as e:

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/artifacts/__init__.py:60, in download_artifacts(artifact_uri, run_id, artifact_path, dst_path, tracking_uri)
     59 if artifact_uri is not None:
---> 60     return _download_artifact_from_uri(artifact_uri, output_path=dst_path)
     62 artifact_path = artifact_path if artifact_path is not None else ""

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/tracking/artifact_utils.py:100, in _download_artifact_from_uri(artifact_uri, output_path)
     99 root_uri, artifact_path = _get_root_uri_and_artifact_path(artifact_uri)
--> 100 return get_artifact_repository(artifact_uri=root_uri).download_artifacts(
    101     artifact_path=artifact_path, dst_path=output_path
    102 )

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/store/artifact/artifact_repo.py:221, in ArtifactRepository.download_artifacts(self, artifact_path, dst_path)
    218     failures = "\n".join(
    219         template.format(path=path, error=error) for path, error in failed_downloads.items()
    220     )
--> 221     raise MlflowException(
    222         message=(
    223             "The following failures occurred while downloading one or more"
    224             f" artifacts from {self.artifact_uri}:\n{_truncate_error(failures)}"
    225         )
    226     )
    228 return os.path.join(dst_path, artifact_path)

MlflowException: The following failures occurred while downloading one or more artifacts from dbfs:/databricks/mlflow-tracking/.../artifacts:
##### File model #####
404 Client Error: Not Found for url: https://$DATABRICKSURL/8188181812650195.jobs/mlflow-tracking/2982154088058434/a1cecbee2f8441c09f3fbe5d7a7587ff/artifacts/model... Response text: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>$PATH/8188181812650195.jobs/mlflow-tracking/2982154088058434/a1cecbee2f8441c09f3fbe5d7a7587ff/artifacts/model</Key><RequestId>$REQUESTID</RequestId><HostId>$HOSTID</HostId></Error>

The above exception was the direct cause of the following exception:

MlflowException                           Traceback (most recent call last)
File <command-2982154088058438>, line 76
---> 76 result = mlflow.register_model(
     77     "runs:/"+run.info.run_id+"/model",
     78     name=registered_name,
     79     await_registration_for=1000,
     80 )
     82 from mlflow import MlflowClient
     83 client = MlflowClient(registry_uri="databricks-uc")

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/tracking/_model_registry/fluent.py:73, in register_model(model_uri, name, await_registration_for, tags)
     17 def register_model(
     18     model_uri,
     19     name,
   (...)
     22     tags: Optional[Dict[str, Any]] = None,
     23 ) -> ModelVersion:
     24     """
     25     Create a new model version in model registry for the model files specified by ``model_uri``.
     26     Note that this method assumes the model registry backend URI is the same as that of the
   (...)
     71         Version: 1
     72     """
---> 73     return _register_model(
     74         model_uri=model_uri, name=name, await_registration_for=await_registration_for, tags=tags
     75     )

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/tracking/_model_registry/fluent.py:108, in _register_model(model_uri, name, await_registration_for, tags, local_model_path)
    105     source = RunsArtifactRepository.get_underlying_uri(model_uri)
    106     (run_id, _) = RunsArtifactRepository.parse_runs_uri(model_uri)
--> 108 create_version_response = client._create_model_version(
    109     name=name,
    110     source=source,
    111     run_id=run_id,
    112     tags=tags,
    113     await_creation_for=await_registration_for,
    114     local_model_path=local_model_path,
    115 )
    116 eprint(
    117     f"Created version '{create_version_response.version}' of model "
    118     f"'{create_version_response.name}'."
    119 )
    120 return create_version_response

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/tracking/client.py:2575, in MlflowClient._create_model_version(self, name, source, run_id, tags, run_link, description, await_creation_for, local_model_path)
   2567     # NOTE: we can't easily delete the target temp location due to the async nature
   2568     # of the model version creation - printing to let the user know.
   2569     eprint(
   2570         f"=== Source model files were copied to {new_source}"
   2571         + " in the model registry workspace. You may want to delete the files once the"
   2572         + " model version is in 'READY' status. You can also find this location in the"
   2573         + " `source` field of the created model version. ==="
   2574     )
-> 2575 return self._get_registry_client().create_model_version(
   2576     name=name,
   2577     source=new_source,
   2578     run_id=run_id,
   2579     tags=tags,
   2580     run_link=run_link,
   2581     description=description,
   2582     await_creation_for=await_creation_for,
   2583     local_model_path=local_model_path,
   2584 )

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/tracking/_model_registry/client.py:196, in ModelRegistryClient.create_model_version(self, name, source, run_id, tags, run_link, description, await_creation_for, local_model_path)
    194 arg_names = _get_arg_names(self.store.create_model_version)
    195 if "local_model_path" in arg_names:
--> 196     mv = self.store.create_model_version(
    197         name,
    198         source,
    199         run_id,
    200         tags,
    201         run_link,
    202         description,
    203         local_model_path=local_model_path,
    204     )
    205 else:
    206     # Fall back to calling create_model_version without
    207     # local_model_path since old model registry store implementations may not
    208     # support the local_model_path argument.
    209     mv = self.store.create_model_version(name, source, run_id, tags, run_link, description)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/store/_unity_catalog/registry/rest_store.py:545, in UcModelRegistryStore.create_model_version(self, name, source, run_id, tags, run_link, description, local_model_path)
    543     extra_headers = {_DATABRICKS_LINEAGE_ID_HEADER: header_base64}
    544 full_name = get_full_name_from_sc(name, self.spark)
--> 545 with self._local_model_dir(source, local_model_path) as local_model_dir:
    546     self._validate_model_signature(local_model_dir)
    547     feature_deps = get_feature_dependencies(local_model_dir)

File /usr/lib/python3.10/contextlib.py:135, in _GeneratorContextManager.__enter__(self)
    133 del self.args, self.kwds, self.func
    134 try:
--> 135     return next(self.gen)
    136 except StopIteration:
    137     raise RuntimeError("generator didn't yield") from None

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/store/_unity_catalog/registry/rest_store.py:487, in UcModelRegistryStore._local_model_dir(self, source, local_model_path)
    483     local_model_dir = mlflow.artifacts.download_artifacts(
    484         artifact_uri=source, tracking_uri=self.tracking_uri
    485     )
    486 except Exception as e:
--> 487     raise MlflowException(
    488         f"Unable to download model artifacts from source artifact location "
    489         f"'{source}' in order to upload them to Unity Catalog. Please ensure "
    490         f"the source artifact location exists and that you can download from "
    491         f"it via mlflow.artifacts.download_artifacts()"
    492     ) from e
    493 # Clean up temporary model directory at end of block. We assume a temporary
    494 # model directory was created if the `source` is not a local path (must be downloaded
    495 # from remote to a temporary directory)
    496 yield local_model_dir

MlflowException: Unable to download model artifacts from source artifact location 'dbfs:/databricks/mlflow-tracking/2982154088058434/a1cecbee2f8441c09f3fbe5d7a7587ff/artifacts/model' in order to upload them to Unity Catalog. Please ensure the source artifact location exists and that you can download from it via mlflow.artifacts.download_artifacts()

When I open the DBFS file browser, I don't see any folder called 'databricks', so I decided to look through it with terminal commands. When I run %ls /dbfs/databricks/ I can see two directories: mlflow-registry and mlflow-tracking. When I run %ls /dbfs/databricks/mlflow-tracking/ or %ls /dbfs/databricks/mlflow-registry/ though I get this error: mount.err*. Granted, I didn't try this with a Unity Catalog enabled cluster, but I don't think I need one to browse through DBFS. Also, at no point in the process do I mount a directory, but we are using Databricks through AWS, so that connection is probably where things are going wrong. I then tried using the full path straight from the error message: %ls /dbfs/databricks/mlflow-tracking/2982154088058434/a1cecbee2f8441c09f3fbe5d7a7587ff/artifacts/model and I got the error: ls: cannot access '/dbfs/databricks/mlflow-tracking/2982154088058434/a1cecbee2f8441c09f3fbe5d7a7587ff/artifacts/model': No such file or directory which suggests that perhaps the filepath actually does not exist after all! From here though I'm at a loss from what to do. I followed the Databricks example code located here and it worked, but for my model things get wonky. I am all out of ideas from where to go from here, so I'd really appreciate any and all tips.

0

There are 0 answers