Azure Databricks API to create job, job doesn't get created after successful call to the API

Question

Azure Databricks API to create job, job doesn't get created after successful call to the API

2.2k views Asked by E. Erfan At 03 January 2025 at 12:58

I am using python 3.6 to make API calls to Azure Databricks to create a job to run a specific notebook. I have followed the instruction of using the API at this link. The only difference is I am using python rather than curl. The code I have written is as follows:

import requests
import os
import json


dbrks_create_job_url =  "https://"+os.environ['DBRKS_INSTANCE']+".azuredatabricks.net//2.0/jobs/create"

DBRKS_REQ_HEADERS = {
    'Authorization': 'Bearer ' + os.environ['DBRKS_BEARER_TOKEN'],
    'X-Databricks-Azure-Workspace-Resource-Id': '/subscriptions/'+ os.environ['DBRKS_SUBSCRIPTION_ID'] +'/resourceGroups/'+ os.environ['DBRKS_RESOURCE_GROUP'] +'/providers/Microsoft.Databricks/workspaces/' + os.environ['DBRKS_WORKSPACE_NAME'],
    'X-Databricks-Azure-SP-Management-Token': os.environ['DBRKS_MANAGEMENT_TOKEN']}


body_json = """
    {
    "name": "A sample job to trigger from DevOps",
    "tasks": [
        {
        "task_key": "ExecuteNotebook",
        "description": "Execute uploaded notebook including tests",
        "depends_on": [],
        "existing_cluster_id": """ + os.environ["DBRKS_CLUSTER_ID"] + """,
        "notebook_task": {
          "notebook_path": "/Users/myuser/sample-notebook",
          "base_parameters": {}
        },
        "timeout_seconds": 300,
        "max_retries": 1,
        "min_retry_interval_millis": 5000,
        "retry_on_timeout": false
      }
],
    "email_notifications": {},
    "name": "my_test_job",
    "max_concurrent_runs": 1}
"""

print("Request body in json format:")
print(body_json)

response = requests.post(dbrks_create_job_url, headers=DBRKS_REQ_HEADERS, data=body_json) 

if response.status_code == 200:
    print("Job created successfully!")
    print(response.status_code)
    print(response.content)
else:
    print("job failed!")
    raise Exception(response.content)

All the OS environment variables are sent from my Azure DevOps pipeline. However, you don't need to execute the script from a pipeline. You can execute it from your local machine as long as you have a service principal with access to a databricks workspace. To run the python script, you can replace those environment variables with your own credentials.

Explaining the variables in the script:

os.environ['DBRKS_INSTANCE']: Name of the databricks instance
os.environ['DBRKS_BEARER_TOKEN']: the bearer token. You need this to authenticate your service principal or your user to databricks. Later I have explained how you can get it.
os.environ['DBRKS_MANAGEMENT_TOKEN']: If the service principle you are using is not added as databricks workspace users or admins, you need this token. Later I have explained how you can get it.
os.environ['DBRKS_SUBSCRIPTION_ID']: The Azure subscription Id where databricks workspace is.
os.environ['DBRKS_RESOURCE_GROUP']: Name of the Azure resource group of the databricks workspace.
os.environ['DBRKS_WORKSPACE_NAME']: Name of the Azure databricks workspace.
os.environ["DBRKS_CLUSTER_ID"]: The cluster Id which will execute the job in databricks.

When I run my script, I get the status code 200 which mean it should have worked properly as shown below:

However, when I look into list of jobs, no new job is created despite the 200 status code received! You can see below the job I have created is not there.

I also changed the API endpoint from azuredatabricks.net//2.0/jobs/create to azuredatabricks.net//2.1/jobs/create, still I get successful run but no job is being created! I can't understand what I am doing wrong. And if I am doing something wrong, how come it doesn't raise exception and gives me 200 status code.

One final point to be able to regenerate the problem I am facing: To get the above two variables for DBRKS_BEARER_TOKEN and DBRKS_MANAGEMENT_TOKEN, you can run the following script and manually replace os.environ['DBRKS_BEARER_TOKEN'] and os.environ['DBRKS_MANAGEMENT_TOKEN'] with the printed values after script execution:

import requests
import json
import os


TOKEN_BASE_URL = 'https://login.microsoftonline.com/' + os.environ['SVCDirectoryID'] + '/oauth2/token'
TOKEN_REQ_HEADERS = {'Content-Type': 'application/x-www-form-urlencoded'}
TOKEN_REQ_BODY = {
       'grant_type': 'client_credentials',
       'client_id': os.environ['SVCApplicationID'],
       'client_secret': os.environ['SVCSecretKey']}



def dbrks_management_token():
        TOKEN_REQ_BODY['resource'] = 'https://management.core.windows.net/'
        response = requests.get(TOKEN_BASE_URL, headers=TOKEN_REQ_HEADERS, data=TOKEN_REQ_BODY)
        if response.status_code == 200:
            print(response.status_code)
        else:
            raise Exception(response.text)
        return response.json()['access_token']


def dbrks_bearer_token():
        TOKEN_REQ_BODY['resource'] = '2ff814a6-3304-4ab8-85cb-cd0e6f879c1d'
        response = requests.get(TOKEN_BASE_URL, headers=TOKEN_REQ_HEADERS, data=TOKEN_REQ_BODY)
        if response.status_code == 200:
            print(response.status_code)
        else:
            raise Exception(response.text)
        return response.json()['access_token']

DBRKS_BEARER_TOKEN = dbrks_bearer_token()
DBRKS_MANAGEMENT_TOKEN = dbrks_management_token()

os.environ['DBRKS_BEARER_TOKEN'] = DBRKS_BEARER_TOKEN 
os.environ['DBRKS_MANAGEMENT_TOKEN'] = DBRKS_MANAGEMENT_TOKEN 

print("DBRKS_BEARER_TOKEN",os.environ['DBRKS_BEARER_TOKEN'])
print("DBRKS_MANAGEMENT_TOKEN",os.environ['DBRKS_MANAGEMENT_TOKEN'])

SVCDirectoryID is Azure Active Directory (AAD) service principal tenant Id
SVCApplicationID is the value of AAD service principal client Id.
SVCSecretKey is AAD service principal secret key.

Thank you for your valuable input.

Original Q&A

There are 1 answers

**Alex Ott** · Accepted Answer · 2022-03-08T10:05:47+00:00

Alex Ott On 08 March 2022 at 10:05 BEST ANSWER

You're mixing up the API versions - the tasks array could be used only with Jobs API 2.1, but you're using Jobs API 2.0. Another error is that you have // between host name & path.

Just change dbrks_create_job_url to "https://"+os.environ['DBRKS_INSTANCE']+".azuredatabricks.net/api/2.1/jobs/create"

TechQA.

Azure Databricks API to create job, job doesn't get created after successful call to the API

There are 1 answers

Related Questions in AZURE

Related Questions in DATABRICKS

Related Questions in AZURE-DATABRICKS

Related Questions in DATABRICKS-WORKFLOWS

Related Questions in DATABRICKS-REST-API

Popular Questions

Popular Tags

Trending Questions