Unable to see scheduled job in Databricks workflow

153 views Asked by At

I am using service principal(PAT token) authentication to authenticate Databricks and schedule a Databricks Notebook.

trigger:
- development

pool: SharedAKS

jobs:
- job: AzureCLI
  steps:
  - checkout: self
  - task: AzureCLI@2
    inputs:
      azureSubscription: $(azureSubscription)
      addSpnToEnvironment: true
      scriptType: 'pscore'
      scriptLocation: 'inlineScript'
      inlineScript: |
        #databrick cli
        # install databrick-cli
        $rg= "
        $storageAccountName=""
        $resourceGroup=$(az group list --query "[?contains(name, '$(rg)')].name | [0]" --output tsv)
        $accountKey=$(az storage account keys list --resource-group $rg --account-name $storageAccountName --query "[0].value" --output tsv)
        $env:STORAGE_ACCOUNT_KEY = $accountKey

        
        echo "Storage Account Key: $accountKey"
        python -m pip install --upgrade pip setuptools wheel databricks-cli
        
        $wsId=(az resource show --resource-type Microsoft.Databricks/workspaces -g $(rg) -n $(databricksName) --query id -o tsv)
        $workspaceUrl=(az resource show --resource-type Microsoft.Databricks/workspaces -g $(rg) -n $(databricksName) --query properties.workspaceUrl --output tsv)

        $workspaceUrlPost='https://'
        $workspaceUrlPost+=$workspaceUrl
        $workspaceUrlPost+='/api/2.0/token/create'
        echo "Https Url with Post: $workspaceUrlPost"

        $workspaceUrlHttps='https://'
        $workspaceUrlHttps+=$workspaceUrl
        $workspaceUrlHttps+='/'
        echo "Https Url : $workspaceUrlHttps"

        # token response for the Azure Databricks app
        $token=(az account get-access-token --resource $(AZURE_DATABRICKS_APP_ID) --query "accessToken" --output tsv)
        echo "Token retrieved: $token"

        # Get a token for the Azure management API
        $azToken=(az account get-access-token --resource https://management.core.windows.net/ --query "accessToken" --output tsv)

        # Create PAT token valid for approximately 10 minutes (600 seconds). Note the quota limit of 600 tokens.
        $pat_token_response=(curl --insecure -X POST ${workspaceUrlPost} `
          -H "Authorization: Bearer $token" `
          -H "X-Databricks-Azure-SP-Management-Token:$azToken" `
          -H "X-Databricks-Azure-Workspace-Resource-Id:$wsId" `
          -d '{"lifetime_seconds": 6000,"comment": "this is an example token"}')
        
        echo "Token retriev: $token" 
        echo "DATABRICKS_TKN: $pat_token_response"
          
        # Print PAT token
        $dapiToken=($pat_token_response | ConvertFrom-Json).token_value
        #dapiToken=$(echo $pat_token_response | jq -r .token_value)
        echo "DATABRICKS_TOKEN: $dapiToken"
        $DATABRICKSTKN = $dapiToken
        echo "##vso[task.setvariable variable=DATABRICKSTKN]$DATABRICKSTKN"

  - script: |
      echo "$(DATABRICKSTKN)"
      echo "Starting Databricks notebook upload..."
      # Install Databricks CLI
      pip install databricks-cli
      echo "DATABRICKS_TOKEN: $(DATABRICKSTKN)"

      # Authenticate with Databricks using the PAT
      echo "Authenticating with Databricks..."
      echo "DATABRICKS_TOKEN: $dapiToken"
      databricks configure --token <<EOF
      https://adb-82410274.14.azuredatabricks.net
      $(DATABRICKSTKN)
      EOF
      # Specify the full paths to the source files
      common_function_path=$(Build.SourcesDirectory)/notebook'
      
      # Specify the full target paths in Databricks
      common_function_target_path="/Users/user/notebook'
      

      # Upload notebooks to Databricks workspace
      echo "Uploading notebooks to Databricks..."
      databricks workspace import --language sql --overwrite "$common_function_path" "$common_function_target_path"
      
      echo "Notebooks uploaded successfully."

      echo "Flex_DDLs.sql notebook executed successfully."
    displayName: 'Upload Databricks Notebooks Job'
  
  - task: Bash@3
    displayName: 'Schedule Databricks Notebook'
    inputs:
      targetType: 'inline'
      script: | 
        databricksUrl='https://adb-824194.14.azuredatabricks.net/api/2.0'
        notebookPath='/Users/user/notebook'
        
        
        jobName='ScheduledJobName1'
        

        requestUri="$databricksUrl/jobs/create"
        
        

        body='{
          "name": "'$jobName'",
          "new_cluster": {
            "spark_version": "7.3.x-scala2.12",
            "node_type_id": "Standard_DS3_v2",
            "num_workers": 0
          },
          "notebook_task": {
            "notebook_path": "'$notebookPath'"
          },
          "schedule": {
              "quartz_cron_expression": "45 10 * * * ?",
              "timezone_id": "Canada/Eastern"
          }
        }'

        

        # Make the API request
        curl -X POST -H "Authorization: Bearer $(DATABRICKSTKN)" -H "Content-Type: application/json" -d "$body" "$requestUri"
        

Below is the log for the scheduling part.

Starting: Schedule Databricks Notebook
==============================================================================
Task         : Bash
Description  : Run a Bash script on macOS, Linux, or Windows
Version      : 3.229.0
Author       : Microsoft Corporation
Help         : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/bash
==============================================================================
Generating script.
========================== Starting Command Output ===========================
/usr/bin/bash /home/vsts/work/_temp/3d882ebd-1553-46f0-b2a2-4f29ff0.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   401  100    26  100   375     54    791 --:--:-- --:--:-- --:--:--   845
{"job_id":789888762}  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100   365  100    27  100   338     69    870 --:--:-- --:--:-- --:--:--   940
Finishing: Schedule Databricks Notebook

I can see the job_id generated in the logs, but I am unable to see the job in the Databricks. Do I need any other access to see the jobs in databricks when authenticated with service principal(PAT token).

Thanks

1

There are 1 answers

0
Kombajn zbożowy On BEST ANSWER

You are submitting job as a service principal, but attempting to view it as a user. You will only see the job if you are an admin (apparently you're not) or the job has an explicit ACL allowing you to see it.

Check Jobs API docs and add following to the payload:

access_control_list=[
  {"user_name": "[email protected]", "permission_level": "CAN_MANAGE_RUN"}
]