Writing delta tables to datalake using directory level SAS user delegation token in PySpark

221 views Asked by At

I have created a user delegation SAS token at directory level with all relevant rights.

Using following configuration to setup spark environment.


spark.conf.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")
spark.conf.set("fs.azure.account.auth.type", "SAS")
spark.conf.set("fs.azure.sas.<mycontainer>.<mystorageaccountname>.blob.core.windows.net", sas_token)

shaded.databricks.org.apache.hadoop.fs.azure.AzureException: hadoop_azure_shaded.com.microsoft.azure.storage.StorageException: Server failed to authenticate the request. Make sure the value of Authorization header is formed correctly including the signature.

I get above error. This user delegation SAS token has all rights(read, add, create, write, list, move, execute) for the directory. Anyone else faced this error and solution for this would be helpful

When I create a user delegation SAS token at container level, the above code works fine and I am able to write delta/parquet tables both. But when I create user delegation SAS token at directory level, it gives me the highlighted error.

1

There are 1 answers

3
Bhavani On

Make sure you have given correct SAS token for authentication I faced the same error when I have given incorrect SAS as shown below:

enter image description here

Make sure you followed below procedure to create user delegated SAS token at directory level: Right click on the directory and select Generate SAS, select user delegation key and select permissions and generate SAS URL and copy it for authentication as shown below:

enter image description here

Use below code to write into parquet file:

spark.conf.set("fs.azure", "org.apache.hadoop.fs.azure.NativeAzureFileSystem")
spark.conf.set("fs.azure.account.auth.type", "SAS")
spark.conf.set("fs.azure.sas.<containerName>.<storageAccountNmae>.blob.core.windows.net", "SASToken")
file_path = "<directory>/<file>"
df = df = spark.read.format("parquet").load("wasbs://<containerName>@<storageAccountName>.blob.core.windows.net/" + file_path)
df.show()

The file will read successfully without any error:

enter image description here