I am using Azure Databricks using a cluster (LTS 13.3) without Credential Passthrough and based on service principals only.
The goal is to create a schema using a location option on adls (using abfss protocol) so that I can create delta table which gets registered in the catalog and the underlying data gets saved in the specified location.
I am able to read and write data to adls storage account using service principals as expected -
spark.conf.set("fs.azure.account.auth.type.MyStorageAcct.dfs.core.windows.net", "OAuth")
spark.conf.set("fs.azure.account.oauth.provider.type.MyStorageAcct.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider")
spark.conf.set("fs.azure.account.oauth2.client.id.MyStorageAcct.dfs.core.windows.net", "relevant_vals")
spark.conf.set("fs.azure.account.oauth2.client.secret.MyStorageAcct.dfs.core.windows.net", "relevant_vals")
spark.conf.set("fs.azure.account.oauth2.client.endpoint.MyStorageAcct.dfs.core.windows.net", "https://login.microsoftonline.com/relevant_vals/oauth2/token")
However, when I try to create the schema -
%sql
CREATE schema my_new_schema LOCATION "abfss://[email protected]/my_new_schema.db"
I get this error -
AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: shaded.databricks.azurebfs.org.apache.hadoop.fs.azurebfs.contracts.exceptions.KeyProviderException Failure to initialize configuration for storage account MyStorageAcct.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.key)
I do not have any issues reading or writing data to adls using the service principals. It feels like the issue is with registering the schema/tables in the default metastore? If I create the schema with credential passthrough (as my id has access) I am able to create the schema.. but writing data fails with the error
df.write.format('delta')\
.mode("overwrite")\
.option("path","abfss://[email protected]/my_new_schema.db/table_test_copy_v20240307.delta")\
.saveAsTable('my_new_schema.table_test_copy_v20240307')
fails with :
Failure to initialize configuration for storage account MyStorageAcct.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key
Can someone please help me understand what is the issue and how I can register schema and tables in the databricks hive metastore while specifying location on adls using service principals? Thanks!
Regarding the error message:
When using the abfss protocol, you cannot directly use a storage account SAS token to access data. When using the abfss protocol, you cannot utilize a storage account access key to access data. To use abfss, you must provide additional configuration options, as detailed in the documentation.
Using a storage access key is only recommended when using the
wasbsprotocol, but it is not advisable for use with ADLS Gen2. Additionally, if you have the necessary permissions, you can utilize a passthrough cluster to access the storage account.To create a table in a blob folder location and store the table on an external location.
Set the Variables:
Create Catalog:
Create Schema:
References: MS Q&A
SO Link.