How to to trigger a Databricks job from another Databricks job?

3.4k views Asked by At

I'm currently working on a project where I have two distinct jobs on Databricks. The second job is dependent on the results of the first one.

I am wondering if there is a way to automatically trigger the second job once the first one has completed successfully. Ideally, I would like to accomplish this directly within Databricks without the need for an external scheduling or orchestration tool. Has anyone been able to implement this type of setup or know if it's possible?

4

There are 4 answers

0
Alex Ott On BEST ANSWER

Databricks is now rolling out the new functionality, called "Job as a Task" that allows to trigger another job as a task in a workflow. Documentation isn't updated yet, but you may see it in the UI.

  • Select "Run Job" when adding a new task:

enter image description here

  • Select specific job to execute as a task:

enter image description here

0
Chen Hirsh On

It's possible to start a workflow using Databricks REST API. See documentation here: https://docs.databricks.com/api/azure/workspace/jobs/runnow

You can also simply enter all the tasks from the 2 workflows into 1 workflow

0
Alexander Volok On

It is also possible to run the job programmatically using Databricks SDK JobsAPI.run_now():


import os
import time

from databricks.sdk import WorkspaceClient
from databricks.sdk.service import jobs

w = WorkspaceClient()

notebook_path = f"/Users/user1/notebook2"

cluster_id = (
    w.clusters.ensure_cluster_is_running(os.environ["DATABRICKS_CLUSTER_ID"])
    and os.environ["DATABRICKS_CLUSTER_ID"]
)

run = w.jobs.submit(
    run_name=f"sdk-{time.time_ns()}",
    tasks=[
        jobs.SubmitTask(
            existing_cluster_id=cluster_id,
            notebook_task=jobs.NotebookTask(notebook_path=notebook_path),
            task_key=f"sdk-{time.time_ns()}",
        )
    ],
).result()


0
arouneaj On

defining the workflow job like below will trigger a workflow run from the current workflow. You require the job id of the workflow that needs to be run

 resources:
  jobs:
    test_job:
      name: test-job
      tasks:
        - task_key: test-job
          run_job_task:
            job_id: 15910000000000000 #workflow_id