I could not find any answers in the Databricks' documentation or even the current databricks-cli repository and I faced a problem during my migration from dbx setup. The migration example in the documentation is quite reduced and do not cover other aspects of the deployment as parameter files for the jobs.
My use case is the bundle deployment of a python wheel job with parameters passed as a file.
# The main job for package_name
artifacts:
package_wheel:
build: poetry build
path: ..
type: whl
config_file:
build: echo Under build
files:
- source: test_conf.yaml
path: ../conf
type: yaml
resources:
jobs:
package_name_job:
name: package_name_job
schedule:
quartz_cron_expression: '44 37 8 * * ?'
timezone_id: Europe/Amsterdam
email_notifications:
on_failure:
- [email protected]
tasks:
- task_key: main_task
job_cluster_key: job_cluster
python_wheel_task:
package_name: package_name
entry_point: main
parameters: ["--conf-file", config_file]
libraries:
# By default we just include the .whl file generated for the package_name package.
# See https://docs.databricks.com/dev-tools/bundles/library-dependencies.html
# for more information on how to add other libraries.
- whl: ../dist/*.whl
job_clusters:
- job_cluster_key: job_cluster
new_cluster:
spark_version: 13.3.x-scala2.12
node_type_id: n1-standard-4
autoscale:
min_workers: 1
max_workers: 4
I just want to configure the job to deploy using different config_files
depending on the targets
described in my databricks.yaml
file. However I am not able of making databrick-cli automatically recognize that files as artifacts
and upload them to the .bundle/[package_name]/[target]/files
path as the built wheel is copied/uploaded to the `.bundle/[package_name]/[target]/artifacts.
I tried to define the config-file as an artifact and use the reference but it does not work.
# The main job for package_name
artifacts:
...
config_file:
build: echo Under build
files:
- source: test_conf.yaml
path: ../conf
type: yaml
resources:
jobs:
package_name_job:
name: package_name_job
...
tasks:
- task_key: main_task
job_cluster_key: job_cluster
python_wheel_task:
package_name: package_name
entry_point: main
parameters: ["--conf-file", ${artifacts.config_file}] # <-- Reference as in terraform?
...
I figure it out ✅
The trick was made by using the
sync
configuration parameter inside thetargets
definition in thedatabricks.yml
bundle definition file.Just to clarify, my python package is handle by [poetry] so I have this project structure:
databricks.yml
Then, I use the reference in the job definition
package_name_job.yml
By creating new folders with the same configuration file name for each
target
this allow me to deploy de job with a different configuration file depending on thetarget
and using the same logic, thus not-modifying my source code, just configuration.