I am trying to pass storage account key to databricks parameter from Devops pipeline.
trigger:
- development
pool: SharedAKS
jobs:
- job: AzureCLI
steps:
- checkout: self
- task: AzureCLI@2
inputs:
azureSubscription: $(azureSubscription)
addSpnToEnvironment: true
scriptType: 'pscore'
scriptLocation: 'inlineScript'
inlineScript: |
#databrick cli
# install databrick-cli
$rg= ""
$storageAccountName=""
$resourceGroup=$(az group list --query "[?contains(name, '$(rg)')].name | [0]" --output tsv)
$accountKey=$(az storage account keys list --resource-group $rg --account-name $storageAccountName --query "[0].value" --output tsv)
$env:STORAGE_ACCOUNT_KEY = $accountKey
echo "Storage Account Key: $accountKey"
python -m pip install --upgrade pip setuptools wheel databricks-cli
$wsId=(az resource show --resource-type Microsoft.Databricks/workspaces -g $(rg) -n $(databricksName) --query id -o tsv)
$workspaceUrl=(az resource show --resource-type Microsoft.Databricks/workspaces -g $(rg) -n $(databricksName) --query properties.workspaceUrl --output tsv)
$workspaceUrlPost='https://'
$workspaceUrlPost+=$workspaceUrl
$workspaceUrlPost+='/api/2.0/token/create'
echo "Https Url with Post: $workspaceUrlPost"
$workspaceUrlHttps='https://'
$workspaceUrlHttps+=$workspaceUrl
$workspaceUrlHttps+='/'
echo "Https Url : $workspaceUrlHttps"
# token response for the Azure Databricks app
$token=(az account get-access-token --resource $(AZURE_DATABRICKS_APP_ID) --query "accessToken" --output tsv)
echo "Token retrieved: $token"
# Get a token for the Azure management API
$azToken=(az account get-access-token --resource https://management.core.windows.net/ --query "accessToken" --output tsv)
# Create PAT token valid for approximately 10 minutes (600 seconds). Note the quota limit of 600 tokens.
$pat_token_response=(curl --insecure -X POST ${workspaceUrlPost} `
-H "Authorization: Bearer $token" `
-H "X-Databricks-Azure-SP-Management-Token:$azToken" `
-H "X-Databricks-Azure-Workspace-Resource-Id:$wsId" `
-d '{"lifetime_seconds": 6000,"comment": "this is an example token"}')
echo "Token retriev: $token"
echo "DATABRICKS_TKN: $pat_token_response"
# Print PAT token
$dapiToken=($pat_token_response | ConvertFrom-Json).token_value
#dapiToken=$(echo $pat_token_response | jq -r .token_value)
echo "DATABRICKS_TOKEN: $dapiToken"
$DATABRICKSTKN = $dapiToken
echo "##vso[task.setvariable variable=DATABRICKSTKN]$DATABRICKSTKN"
- script: |
echo "$(DATABRICKSTKN)"
echo "Starting Databricks notebook upload..."
# Install Databricks CLI
pip install databricks-cli
echo "DATABRICKS_TOKEN: $(DATABRICKSTKN)"
# Authenticate with Databricks using the PAT
echo "Authenticating with Databricks..."
echo "DATABRICKS_TOKEN: $dapiToken"
databricks configure --token <<EOF
https://adb-82.14.azuredatabricks.net
$(DATABRICKSTKN)
EOF
displayName: 'Upload Databricks Notebooks Job'
- task: Bash@3
displayName: 'Schedule Databricks Notebook'
inputs:
targetType: 'inline'
script: |
databricksUrl='https://adb-8.14.azuredatabricks.net/api/2.0'
notebookPath1='/Users/user/notebook'
jobName2='testjob'
requestUriRun="$databricksUrl/jobs/runs/submit"
body2='{
"name": "'$jobName2'",
"new_cluster": {
"spark_version": "7.3.x-scala2.12",
"node_type_id": "Standard_DS3_v2",
"num_workers": 0
},
"notebook_task": {
"notebook_path": "'$notebookPath1'",
"base_parameters": {
"env": {"STORAGE_ACCOUNT_KEY": "'$STORAGE_ACCOUNT_KEY'"}
}
}
}'
curl -X POST -H "Authorization: Bearer $(DATABRICKSTKN)" -H "Content-Type: application/json" -d "$body2" "$requestUriRun"
I can see the below in the pipeline logs.
Storage Account Key: ***
Below is the Databricks setup. Cell:1
%python
dbutils.widgets.text("env", "", "Environment Variable")
env = dbutils.widgets.get("env")
print("Value of 'env' parameter:", env)
Output
Value of 'env' parameter:
Cell:2
%python
# Databricks notebook source
storage_account_name = ""
storage_account_access_key = env
container = "raw"
mountDir = ""
dbutils.fs.mount(
source = "wasbs://"+container+"@xxxx.blob.core.windows.net",
mount_point = "/mnt/" + mountDir,extra_configs = {"fs.azure.account.key."+storage_account_name+".blob.core.windows.net":storage_account_access_key})
Error:
shaded.databricks.org.apache.hadoop.fs.azure.AzureException: java.lang.IllegalArgumentException: Storage Key is not a valid base64 encoded string.
I see empty key in databricks , when I tried to print it and when I pass the variable to cell2, I am getting the above error. Am I passing the storage account key properly in the json body.
Thank you.
After getting the value of storage account key using the commend below.
If you want to pass the value to the subsequent pipeline tasks, you need to use the logging command '
SetVariable
' to set a pipeline variable with the value. Then the subsequent tasks can use the value via calling the pipeline variable.If you want to pass the value of storage account key to the subsequent tasks within the same job, you can set a general pipeline variable with the value. The command also automatically maps an environment variable for the general variable. This variable will be only available for the subsequent tasks within the same job.
For example.
If you want to pass the value of storage account key to other jobs or stages within the same pipeline, you can set an output variable with the value. For more details, see "Use output variables from tasks".
For example.
In addition, the following command you are using on the
AzureCLI@2
task, it may just set up a temporary environment variable that could be only available for current session of the task. After the task is completed, it could be discarded and not available for the subsequent tasks.