I could not find any answers in the Databricks' documentation or even the current databricks-cli repository and I faced a problem during my migration from dbx setup. The migration example in the documentation is quite reduced and do not cover other aspects of the deployment as parameter files for the jobs.
My use case is the bundle deployment of a python wheel job with parameters passed as a file.
# The main job for package_name
artifacts:
package_wheel:
build: poetry build
path: ..
type: whl
config_file:
build: echo Under build
files:
- source: test_conf.yaml
path: ../conf
type: yaml
resources:
jobs:
package_name_job:
name: package_name_job
schedule:
quartz_cron_expression: '44 37 8 * * ?'
timezone_id: Europe/Amsterdam
email_notifications:
on_failure:
- [email protected]
tasks:
- task_key: main_task
job_cluster_key: job_cluster
python_wheel_task:
package_name: package_name
entry_point: main
parameters: ["--conf-file", config_file]
libraries:
# By default we just include the .whl file generated for the package_name package.
# See https://docs.databricks.com/dev-tools/bundles/library-dependencies.html
# for more information on how to add other libraries.
- whl: ../dist/*.whl
job_clusters:
- job_cluster_key: job_cluster
new_cluster:
spark_version: 13.3.x-scala2.12
node_type_id: n1-standard-4
autoscale:
min_workers: 1
max_workers: 4
I just want to configure the job to deploy using different config_files depending on the targets described in my databricks.yaml file. However I am not able of making databrick-cli automatically recognize that files as artifacts and upload them to the .bundle/[package_name]/[target]/files path as the built wheel is copied/uploaded to the `.bundle/[package_name]/[target]/artifacts.
I tried to define the config-file as an artifact and use the reference but it does not work.
# The main job for package_name
artifacts:
...
config_file:
build: echo Under build
files:
- source: test_conf.yaml
path: ../conf
type: yaml
resources:
jobs:
package_name_job:
name: package_name_job
...
tasks:
- task_key: main_task
job_cluster_key: job_cluster
python_wheel_task:
package_name: package_name
entry_point: main
parameters: ["--conf-file", ${artifacts.config_file}] # <-- Reference as in terraform?
...
I figure it out ✅
The trick was made by using the
syncconfiguration parameter inside thetargetsdefinition in thedatabricks.ymlbundle definition file.Just to clarify, my python package is handle by [poetry] so I have this project structure:
databricks.ymlThen, I use the reference in the job definition
package_name_job.ymlBy creating new folders with the same configuration file name for each
targetthis allow me to deploy de job with a different configuration file depending on thetargetand using the same logic, thus not-modifying my source code, just configuration.