I am attempting to log, register, and deploy a finetuned GPT2 model in Databricks. While I have been able to get my logging code to run, when I try to run my registration code, I get an MlflowException error.
Here is my model logging code.
mlflow.set_registry_uri("databricks-uc")
with mlflow.start_run() as run:
mlflow.transformers.log_model(
transformers_model=pipeline,
artifact_path="gpt2",
registered_model_name=registered_model_name,
input_example=input_example,
signature=signature,
task="text-generation",
inference_config = inference_config,
await_registration_for=60 * 60,
)
And here is my registration code:
mlflow.set_registry_uri("databricks-uc")
mlflow.set_tracking_uri("databricks")
result = mlflow.register_model(
model_uri="runs:/"+run.info.run_id+"/model",
name=registered_name,
await_registration_for=1000,
)
Here is the full traceback, lightly edited.
MlflowException Traceback (most recent call last)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/store/_unity_catalog/registry/rest_store.py:483, in UcModelRegistryStore._local_model_dir(self, source, local_model_path)
482 try:
--> 483 local_model_dir = mlflow.artifacts.download_artifacts(
484 artifact_uri=source, tracking_uri=self.tracking_uri
485 )
486 except Exception as e:
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/artifacts/__init__.py:60, in download_artifacts(artifact_uri, run_id, artifact_path, dst_path, tracking_uri)
59 if artifact_uri is not None:
---> 60 return _download_artifact_from_uri(artifact_uri, output_path=dst_path)
62 artifact_path = artifact_path if artifact_path is not None else ""
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/tracking/artifact_utils.py:100, in _download_artifact_from_uri(artifact_uri, output_path)
99 root_uri, artifact_path = _get_root_uri_and_artifact_path(artifact_uri)
--> 100 return get_artifact_repository(artifact_uri=root_uri).download_artifacts(
101 artifact_path=artifact_path, dst_path=output_path
102 )
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/store/artifact/artifact_repo.py:221, in ArtifactRepository.download_artifacts(self, artifact_path, dst_path)
218 failures = "\n".join(
219 template.format(path=path, error=error) for path, error in failed_downloads.items()
220 )
--> 221 raise MlflowException(
222 message=(
223 "The following failures occurred while downloading one or more"
224 f" artifacts from {self.artifact_uri}:\n{_truncate_error(failures)}"
225 )
226 )
228 return os.path.join(dst_path, artifact_path)
MlflowException: The following failures occurred while downloading one or more artifacts from dbfs:/databricks/mlflow-tracking/.../artifacts:
##### File model #####
404 Client Error: Not Found for url: https://$DATABRICKSURL/8188181812650195.jobs/mlflow-tracking/2982154088058434/a1cecbee2f8441c09f3fbe5d7a7587ff/artifacts/model... Response text: <?xml version="1.0" encoding="UTF-8"?>
<Error><Code>NoSuchKey</Code><Message>The specified key does not exist.</Message><Key>$PATH/8188181812650195.jobs/mlflow-tracking/2982154088058434/a1cecbee2f8441c09f3fbe5d7a7587ff/artifacts/model</Key><RequestId>$REQUESTID</RequestId><HostId>$HOSTID</HostId></Error>
The above exception was the direct cause of the following exception:
MlflowException Traceback (most recent call last)
File <command-2982154088058438>, line 76
---> 76 result = mlflow.register_model(
77 "runs:/"+run.info.run_id+"/model",
78 name=registered_name,
79 await_registration_for=1000,
80 )
82 from mlflow import MlflowClient
83 client = MlflowClient(registry_uri="databricks-uc")
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/tracking/_model_registry/fluent.py:73, in register_model(model_uri, name, await_registration_for, tags)
17 def register_model(
18 model_uri,
19 name,
(...)
22 tags: Optional[Dict[str, Any]] = None,
23 ) -> ModelVersion:
24 """
25 Create a new model version in model registry for the model files specified by ``model_uri``.
26 Note that this method assumes the model registry backend URI is the same as that of the
(...)
71 Version: 1
72 """
---> 73 return _register_model(
74 model_uri=model_uri, name=name, await_registration_for=await_registration_for, tags=tags
75 )
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/tracking/_model_registry/fluent.py:108, in _register_model(model_uri, name, await_registration_for, tags, local_model_path)
105 source = RunsArtifactRepository.get_underlying_uri(model_uri)
106 (run_id, _) = RunsArtifactRepository.parse_runs_uri(model_uri)
--> 108 create_version_response = client._create_model_version(
109 name=name,
110 source=source,
111 run_id=run_id,
112 tags=tags,
113 await_creation_for=await_registration_for,
114 local_model_path=local_model_path,
115 )
116 eprint(
117 f"Created version '{create_version_response.version}' of model "
118 f"'{create_version_response.name}'."
119 )
120 return create_version_response
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/tracking/client.py:2575, in MlflowClient._create_model_version(self, name, source, run_id, tags, run_link, description, await_creation_for, local_model_path)
2567 # NOTE: we can't easily delete the target temp location due to the async nature
2568 # of the model version creation - printing to let the user know.
2569 eprint(
2570 f"=== Source model files were copied to {new_source}"
2571 + " in the model registry workspace. You may want to delete the files once the"
2572 + " model version is in 'READY' status. You can also find this location in the"
2573 + " `source` field of the created model version. ==="
2574 )
-> 2575 return self._get_registry_client().create_model_version(
2576 name=name,
2577 source=new_source,
2578 run_id=run_id,
2579 tags=tags,
2580 run_link=run_link,
2581 description=description,
2582 await_creation_for=await_creation_for,
2583 local_model_path=local_model_path,
2584 )
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/tracking/_model_registry/client.py:196, in ModelRegistryClient.create_model_version(self, name, source, run_id, tags, run_link, description, await_creation_for, local_model_path)
194 arg_names = _get_arg_names(self.store.create_model_version)
195 if "local_model_path" in arg_names:
--> 196 mv = self.store.create_model_version(
197 name,
198 source,
199 run_id,
200 tags,
201 run_link,
202 description,
203 local_model_path=local_model_path,
204 )
205 else:
206 # Fall back to calling create_model_version without
207 # local_model_path since old model registry store implementations may not
208 # support the local_model_path argument.
209 mv = self.store.create_model_version(name, source, run_id, tags, run_link, description)
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/store/_unity_catalog/registry/rest_store.py:545, in UcModelRegistryStore.create_model_version(self, name, source, run_id, tags, run_link, description, local_model_path)
543 extra_headers = {_DATABRICKS_LINEAGE_ID_HEADER: header_base64}
544 full_name = get_full_name_from_sc(name, self.spark)
--> 545 with self._local_model_dir(source, local_model_path) as local_model_dir:
546 self._validate_model_signature(local_model_dir)
547 feature_deps = get_feature_dependencies(local_model_dir)
File /usr/lib/python3.10/contextlib.py:135, in _GeneratorContextManager.__enter__(self)
133 del self.args, self.kwds, self.func
134 try:
--> 135 return next(self.gen)
136 except StopIteration:
137 raise RuntimeError("generator didn't yield") from None
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-a56b0856-4b58-4270-93c1-f4e3d186cf4a/lib/python3.10/site-packages/mlflow/store/_unity_catalog/registry/rest_store.py:487, in UcModelRegistryStore._local_model_dir(self, source, local_model_path)
483 local_model_dir = mlflow.artifacts.download_artifacts(
484 artifact_uri=source, tracking_uri=self.tracking_uri
485 )
486 except Exception as e:
--> 487 raise MlflowException(
488 f"Unable to download model artifacts from source artifact location "
489 f"'{source}' in order to upload them to Unity Catalog. Please ensure "
490 f"the source artifact location exists and that you can download from "
491 f"it via mlflow.artifacts.download_artifacts()"
492 ) from e
493 # Clean up temporary model directory at end of block. We assume a temporary
494 # model directory was created if the `source` is not a local path (must be downloaded
495 # from remote to a temporary directory)
496 yield local_model_dir
MlflowException: Unable to download model artifacts from source artifact location 'dbfs:/databricks/mlflow-tracking/2982154088058434/a1cecbee2f8441c09f3fbe5d7a7587ff/artifacts/model' in order to upload them to Unity Catalog. Please ensure the source artifact location exists and that you can download from it via mlflow.artifacts.download_artifacts()
When I open the DBFS file browser, I don't see any folder called 'databricks', so I decided to look through it with terminal commands. When I run %ls /dbfs/databricks/
I can see two directories: mlflow-registry
and mlflow-tracking
. When I run %ls /dbfs/databricks/mlflow-tracking/
or %ls /dbfs/databricks/mlflow-registry/
though I get this error: mount.err*
. Granted, I didn't try this with a Unity Catalog enabled cluster, but I don't think I need one to browse through DBFS. Also, at no point in the process do I mount a directory, but we are using Databricks through AWS, so that connection is probably where things are going wrong. I then tried using the full path straight from the error message: %ls /dbfs/databricks/mlflow-tracking/2982154088058434/a1cecbee2f8441c09f3fbe5d7a7587ff/artifacts/model
and I got the error: ls: cannot access '/dbfs/databricks/mlflow-tracking/2982154088058434/a1cecbee2f8441c09f3fbe5d7a7587ff/artifacts/model': No such file or directory
which suggests that perhaps the filepath actually does not exist after all! From here though I'm at a loss from what to do. I followed the Databricks example code located here and it worked, but for my model things get wonky. I am all out of ideas from where to go from here, so I'd really appreciate any and all tips.