Writing parquet file throws...An HTTP header that's mandatory for this request is not specified

Question

Writing parquet file throws...An HTTP header that's mandatory for this request is not specified

1.9k views Asked by user3023949 At 12 October 2020 at 23:33

I have two ADLSv2 storage accounts, both are hierarchical namespace enabled. In my Python Notebook, I'm reading a CSV file from one storage account and writing as parquet file in another storage, after some enrichment.

I am getting below error when writing the parquet file...

StatusCode=400, An HTTP header that's mandatory for this request is not

Any help is greatly appreciated.

Below is my Notebook code snippet...

# Databricks notebook source
# MAGIC %python
# MAGIC 
# MAGIC STAGING_MOUNTPOINT = "/mnt/inputfiles"
# MAGIC if STAGING_MOUNTPOINT in [mnt.mountPoint for mnt in dbutils.fs.mounts()]:
# MAGIC   dbutils.fs.unmount(STAGING_MOUNTPOINT)
# MAGIC 
# MAGIC PERM_MOUNTPOINT = "/mnt/outputfiles"
# MAGIC if PERM_MOUNTPOINT in [mnt.mountPoint for mnt in dbutils.fs.mounts()]:
# MAGIC   dbutils.fs.unmount(PERM_MOUNTPOINT)

STAGING_STORAGE_ACCOUNT = "--------"
STAGING_CONTAINER = "--------"
STAGING_FOLDER = --------"
PERM_STORAGE_ACCOUNT = "--------"
PERM_CONTAINER = "--------"

configs = {
 "fs.azure.account.auth.type": "OAuth",
 "fs.azure.account.oauth.provider.type": 
 "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
 "fs.azure.account.oauth2.client.id": "#####################",
 "fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope="DemoScope",key="DemoSecret"),
 "fs.azure.account.oauth2.client.endpoint": 
 "https://login.microsoftonline.com/**********************/oauth2/token"}

STAGING_SOURCE = 
"abfss://{container}@{storage_acct}.blob.core.windows.net/".format(container=STAGING_CONTAINER, 
storage_acct=STAGING_STORAGE_ACCOUNT)

try:
 dbutils.fs.mount(
  source=STAGING_SOURCE,
  mount_point=STAGING_MOUNTPOINT,
  extra_configs=configs)
except Exception as e:
 if "Directory already mounted" in str(e):
 pass # Ignore error if already mounted.
else:
 raise e

print("Staging Storage mount Success.")

inputDemoFile = "{}/{}/demo.csv".format(STAGING_MOUNTPOINT, STAGING_FOLDER)
readDF = (spark
          .read.option("header", True)
          .schema(inputSchema)
          .option("inferSchema", True)
          .csv(inputDemoFile))

LANDING_SOURCE = 
 "abfss://{container}@{storage_acct}.blob.core.windows.net/".format(container=LANDING_CONTAINER, 
 storage_acct=PERM_STORAGE_ACCOUNT)

try:
 dbutils.fs.mount(
 source=PERM_SOURCE,
 mount_point=PERM_MOUNTPOINT,
 extra_configs=configs)
except Exception as e:
 if "Directory already mounted" in str(e):
  pass # Ignore error if already mounted.
 else:
  raise e

print("Landing Storage mount Success.")

outPatientsFile = "{}/patients.parquet".format(outPatientsFilePath)
print("Writing to parquet file: " + outPatientsFile)

***Below call is failing…error is 
StatusCode=400
StatusDescription=An HTTP header that's mandatory for this request is not specified.
ErrorCode=
ErrorMessage=***

(readDF
 .coalesce(1)
 .write
 .mode("overwrite")
 .option("header", "true")
 .option("compression", "snappy")
 .parquet(outPatientsFile)
)

Original Q&A

There are 2 answers

**CHEEKATLAPRADEEP** · Answer 1 · 2020-10-13T05:52:34+00:00

Couple of important points to note while mounting Storage accounts in Azure Databricks.

For Azure Blob storage: source = "wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory-name>"

For Azure Data Lake Storage gen2: source = "abfss://<file-system-name>@<storage-account-name>.dfs.core.windows.net/"

To mount an Azure Data Lake Storage Gen2 filesystem or a folder inside it as Azure Databricks file system, the URL should be like abfss://<file-system-name>@<storage-account-name>.dfs.core.windows.net/

Reference: Azure Databricks - Azure Data Lake Storage Gen2

**Jim Xu** · Answer 2 · 2020-10-13T01:30:11+00:00

I summarize the solution as below.

If you want to mount Azure data lake storage gen2 as Azure databricks file system, the URL should be like abfss://<file-system-name>@<storage-account-name>.dfs.core.windows.net/. For more details, please refer to here

For example

Create an Azure Data Lake Storage Gen2 account.

az login
az storage account create \
    --name <account-name> \
    --resource-group <group name> \
    --location westus \
    --sku Standard_RAGRS \
    --kind StorageV2 \
    --enable-hierarchical-namespace true

Create a service principal and assign Storage Blob Data Contributor to the sp in the scope of the Data Lake Storage Gen2 storage account

az login

az ad sp create-for-rbac -n "MyApp" --role "Storage Blob Data Contributor" \
    --scopes /subscriptions/<subscription>/resourceGroups/<resource-group>/providers/Microsoft.Storage/storageAccounts/<storage-account>

mount Azure data lake gen2 in Azure databricks(python)

configs = {"fs.azure.account.auth.type": "OAuth",
       "fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
       "fs.azure.account.oauth2.client.id": "<appId>",
       "fs.azure.account.oauth2.client.secret": "<clientSecret>",
       "fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant>/oauth2/token",
       "fs.azure.createRemoteFileSystemDuringInitialization": "true"}

dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/folder1",
mount_point = "/mnt/flightdata",
extra_configs = configs)

TechQA.

Writing parquet file throws...An HTTP header that's mandatory for this request is not specified

There are 2 answers

Related Questions in PARQUET

Related Questions in AZURE-DATABRICKS

Related Questions in AZURE-BLOB-STORAGE

Related Questions in SPARK-NOTEBOOK

Popular Questions

Popular Tags

Trending Questions