I am using service principal(PAT token) authentication to authenticate Databricks and schedule a Databricks Notebook.
trigger:
- development
pool: SharedAKS
jobs:
- job: AzureCLI
steps:
- checkout: self
- task: AzureCLI@2
inputs:
azureSubscription: $(azureSubscription)
addSpnToEnvironment: true
scriptType: 'pscore'
scriptLocation: 'inlineScript'
inlineScript: |
#databrick cli
# install databrick-cli
$rg= "
$storageAccountName=""
$resourceGroup=$(az group list --query "[?contains(name, '$(rg)')].name | [0]" --output tsv)
$accountKey=$(az storage account keys list --resource-group $rg --account-name $storageAccountName --query "[0].value" --output tsv)
$env:STORAGE_ACCOUNT_KEY = $accountKey
echo "Storage Account Key: $accountKey"
python -m pip install --upgrade pip setuptools wheel databricks-cli
$wsId=(az resource show --resource-type Microsoft.Databricks/workspaces -g $(rg) -n $(databricksName) --query id -o tsv)
$workspaceUrl=(az resource show --resource-type Microsoft.Databricks/workspaces -g $(rg) -n $(databricksName) --query properties.workspaceUrl --output tsv)
$workspaceUrlPost='https://'
$workspaceUrlPost+=$workspaceUrl
$workspaceUrlPost+='/api/2.0/token/create'
echo "Https Url with Post: $workspaceUrlPost"
$workspaceUrlHttps='https://'
$workspaceUrlHttps+=$workspaceUrl
$workspaceUrlHttps+='/'
echo "Https Url : $workspaceUrlHttps"
# token response for the Azure Databricks app
$token=(az account get-access-token --resource $(AZURE_DATABRICKS_APP_ID) --query "accessToken" --output tsv)
echo "Token retrieved: $token"
# Get a token for the Azure management API
$azToken=(az account get-access-token --resource https://management.core.windows.net/ --query "accessToken" --output tsv)
# Create PAT token valid for approximately 10 minutes (600 seconds). Note the quota limit of 600 tokens.
$pat_token_response=(curl --insecure -X POST ${workspaceUrlPost} `
-H "Authorization: Bearer $token" `
-H "X-Databricks-Azure-SP-Management-Token:$azToken" `
-H "X-Databricks-Azure-Workspace-Resource-Id:$wsId" `
-d '{"lifetime_seconds": 6000,"comment": "this is an example token"}')
echo "Token retriev: $token"
echo "DATABRICKS_TKN: $pat_token_response"
# Print PAT token
$dapiToken=($pat_token_response | ConvertFrom-Json).token_value
#dapiToken=$(echo $pat_token_response | jq -r .token_value)
echo "DATABRICKS_TOKEN: $dapiToken"
$DATABRICKSTKN = $dapiToken
echo "##vso[task.setvariable variable=DATABRICKSTKN]$DATABRICKSTKN"
- script: |
echo "$(DATABRICKSTKN)"
echo "Starting Databricks notebook upload..."
# Install Databricks CLI
pip install databricks-cli
echo "DATABRICKS_TOKEN: $(DATABRICKSTKN)"
# Authenticate with Databricks using the PAT
echo "Authenticating with Databricks..."
echo "DATABRICKS_TOKEN: $dapiToken"
databricks configure --token <<EOF
https://adb-82410274.14.azuredatabricks.net
$(DATABRICKSTKN)
EOF
# Specify the full paths to the source files
common_function_path=$(Build.SourcesDirectory)/notebook'
# Specify the full target paths in Databricks
common_function_target_path="/Users/user/notebook'
# Upload notebooks to Databricks workspace
echo "Uploading notebooks to Databricks..."
databricks workspace import --language sql --overwrite "$common_function_path" "$common_function_target_path"
echo "Notebooks uploaded successfully."
echo "Flex_DDLs.sql notebook executed successfully."
displayName: 'Upload Databricks Notebooks Job'
- task: Bash@3
displayName: 'Schedule Databricks Notebook'
inputs:
targetType: 'inline'
script: |
databricksUrl='https://adb-824194.14.azuredatabricks.net/api/2.0'
notebookPath='/Users/user/notebook'
jobName='ScheduledJobName1'
requestUri="$databricksUrl/jobs/create"
body='{
"name": "'$jobName'",
"new_cluster": {
"spark_version": "7.3.x-scala2.12",
"node_type_id": "Standard_DS3_v2",
"num_workers": 0
},
"notebook_task": {
"notebook_path": "'$notebookPath'"
},
"schedule": {
"quartz_cron_expression": "45 10 * * * ?",
"timezone_id": "Canada/Eastern"
}
}'
# Make the API request
curl -X POST -H "Authorization: Bearer $(DATABRICKSTKN)" -H "Content-Type: application/json" -d "$body" "$requestUri"
Below is the log for the scheduling part.
Starting: Schedule Databricks Notebook
==============================================================================
Task : Bash
Description : Run a Bash script on macOS, Linux, or Windows
Version : 3.229.0
Author : Microsoft Corporation
Help : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/bash
==============================================================================
Generating script.
========================== Starting Command Output ===========================
/usr/bin/bash /home/vsts/work/_temp/3d882ebd-1553-46f0-b2a2-4f29ff0.sh
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 401 100 26 100 375 54 791 --:--:-- --:--:-- --:--:-- 845
{"job_id":789888762} % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
100 365 100 27 100 338 69 870 --:--:-- --:--:-- --:--:-- 940
Finishing: Schedule Databricks Notebook
I can see the job_id generated in the logs, but I am unable to see the job in the Databricks. Do I need any other access to see the jobs in databricks when authenticated with service principal(PAT token).
Thanks
You are submitting job as a service principal, but attempting to view it as a user. You will only see the job if you are an admin (apparently you're not) or the job has an explicit ACL allowing you to see it.
Check Jobs API docs and add following to the payload: