Databricks deployemnt with Terraform

165 views Asked by At

I just noticed that something has been updated in Azure ( I cant find clear docuemntation about it) regarding deploying the databricks workspace.

Im using terraform to deploy the resource and it also deployes the the access connector, storage data lake for UC, resource comes with the default metastore without path. Also, in the catalog I see the catalog name which is same as my workspace name, aslo assigns the access connect as storage blob data contributor to the storage account that is created by itself. This strange is that my terraform plan does not show this ( something that terraform needs to udpate I guess).

I tried to create the databricsk resource manually in Portal and same resources got deployed.

I tried to ignore them and continue with my deployment where I create my own storage datalake that will be used for my UC and external locations. But I got first conflict that I could not create more than one metastore in the same region. I deleted the default metastore and it solved the problem.

But now I have the problem that when I call the notebook from the Azure data factory its trying to access to filesystem that is deployed as default.

Error I get is that

Databricks execution failed with error state: InternalError, error message: Operation failed: "The specified filesystem does not exist.", 404, HEAD, https://dbstorageq.dfs.core.windows.net/jobs/?upn=false&action=getAccessControl&timeout=60 Cause: Operation failed: "The specified filesystem does not exist.", 404, HEAD

Indeed, I dont have the external location in databricks with this storage account linked dbstorageq and this is the storage account that got deployed as default.

My guess is that now the defaul filesytem that is refering to is the one azure deployed itself but how am I suppose to change this to the one I have deployed.

P.S Any documentation or udpate from azure about the above?

1

There are 1 answers

0
Vinay B On BEST ANSWER

Recent Updates in Azure Databricks Workspace Deployment and Terraform Integration

A possible reason for the problem you are facing with deploying Azure Databricks workspaces using Terraform is that Azure Databricks automatically creates some resources, for example, a default metastore and a storage account. These resources may not show up in the Terraform plan right away.

To address the issue you're facing:

  1. Default Filesystem Issue: It seems that Azure Data Factory (ADF) is trying to access the wrong filesystem. The error message ("The specified filesystem does not exist.") indicates that ADF is looking for the default filesystem that Azure Databricks created automatically, not the one you've chosen. To fix this, you'll need to make sure your ADF pipeline or Databricks job points to the right filesystem. You can do this by providing the correct filesystem path in your ADF activity or Databricks notebook/job settings.

  2. Terraform Configuration for Azure Databricks: In your Terraform configuration, it's crucial to define the resources and settings for your Azure Databricks workspace precisely. For deploying an Azure Databricks workspace, you'll typically use the azurerm_databricks_workspace resource. Along with this, you may need to configure additional resources such as databricks_secret_scope, databricks_token, databricks_job, and databricks_cluster, depending on your specific requirements.

  3. Managing Storage and Filesystems: You may have to specify the storage account and filesystem in your Terraform configuration to make sure you are using the right ones. Azure Databricks supports different kinds of storage mounts, such as Azure Data Lake Storage Gen1 (databricks_azure_adls_gen1_mount), Gen2 (databricks_azure_adls_gen2_mount), or Azure Blob Storage (databricks_azure_blob_mount).

References:

https://learn.microsoft.com/en-us/azure/databricks/release-notes/

https://learn.microsoft.com/en-us/azure/databricks/dev-tools/terraform/azure-workspace

https://learn.microsoft.com/en-us/azure/databricks/security/network/classic/vnet-inject