I'm migrating the current hive metastore tables in my Azure Databricks workspace to Unity Catalog (UC), and I encountered and issue related to git clone to a Volume.
So my cluster setting will be something like:
- DBR 13.3 LTS
- Mode: Shared (UC enabled)
So earlier, in my non UC enabled cluster I would have a cell in the notebook like the following to git clone my repo to a DBFS tmp location:
!git clone https://[email protected]/repo_path /tmp/repo
But now since my UC enabled cluster I want to clone the repo inside a volume so I can potentially remove the repo directory at the beginning of the notebook (dbutils.fs.rm("/Volumes/catalogname/schemaname/volumename/tmp/repo", True)
which works), like the following:
!git clone https://[email protected]/repo_path /Volumes/catalogname/schemaname/volumename/tmp/repo
But appears to get stuck in the Resolving deltas
step while cloning.
Does anyone has faced this issue, and got a solution to this? I'm thinking maybe the git clone has to be done differently now, or my last option is to maybe include the git clone command in a init script, and make the UC enabled cluster run it when starting up.
Found a workaround which solves the issue initially posted. A modified a CI/CD azure devops pipeline I has running already which in my case runs on the same repository I need to clone, but also can clone external repositories.
First I included a new task during the build stage to copy the repository in a directory, so the task after publishes the directory into an artifact:
Then, the second part is that during the deploy stage (you need a download artifact step too) I included a AzureFileCopy@5 task which copies that directory (aka. my repository) into my ADLS (Azure Data Lake Storage) location, which is the same location my Databrick's UC Volume has access to, and therefore I can see my repository in the UC Volume, like the following: