Deploy sql workflow with DBX

448 views Asked by At

I am developing deployment via DBX to Azure Databricks. In this regard I need a data job written in SQL to happen everyday. The job is located in the file data.sql. I know how to do it with a python file. Here I would do the following:

build:
  python: "pip"
environments:
  default:
    workflows:
      
      - name: "workflow-name"
        #schedule:
        quartz_cron_expression: "0 0 9 * * ?" # every day at 9.00
        timezone_id: "Europe"

        format: MULTI_TASK #

        job_clusters:
          - job_cluster_key: "basic-job-cluster"
            <<: *base-job-cluster

        tasks:
          - task_key: "task-name"
            job_cluster_key: "basic-job-cluster"
            spark_python_task:
              python_file: "file://filename.py"

But how can I change it so I can run a SQL job instead? I imagine it is the last two lines of code (spark_python_task: and python_file: "file://filename.py") which needs to be changed.

2

There are 2 answers

0
renardeinside On BEST ANSWER

There are various ways to do that.

(1) One of the most simplest is to add a SQL query in the Databricks SQL lens, and then reference this query via sql_task as described here.

(2) If you want to have a Python project that re-uses SQL statements from a static file, you can add this file to your Python Package and then call it from your package, e.g.:

sql_statement = ... # code to read from the file
spark.sql(sql_statement)

(3) A third option is to use the DBT framework with Databricks. In this case you probably would like to use dbt_task as described here.

0
andKaae On

I found a simple workaround (although might not be the prettiest) to simply change the data.sql to a python file and run the queries using spark. This way I could use the same spark_python_task.