I want to do something pretty simple here: import a module from the local filesystem using databricks asset bundles. These are the relevant files:
databricks.yml
bundle:
name: my_bundle
workspace:
host: XXX
targets:
dev:
mode: development
default: true
resources:
jobs:
my_job:
name: my_job
tasks:
- task_key: my_task
existing_cluster_id: YYY
spark_python_task:
python_file: src/jobs/bronze/my_script.py
my_script.py
from src.jobs.common import *
if __name__ == "__main__":
hello_world()
common.py
def hello_world():
print("hello_world")
And the following folder structure:
databricks.yml
src/
├── __init__.py
└── jobs
├── __init__.py
├── bronze
│ └── my_script.py
└── common.py
I'm deploying this to my workspace + running it by using Databricks CLI v0.206.0 with the following commands:
databricks bundle validate
databricks bundle deploy
databricks bundle run my_job
I'm getting issues to import my common.py
module. I'm getting the classic ModuleNotFoundError: No module named 'src'
error here.
I've added the __init__.py
files as I typically do when doing this locally, and tried the following variations:
from src.jobs.common import *
from jobs.common import *
from common import *
from ..common import *
I guess my issue is that I don't really know what the python path is here, since I'm deploying it on Databricks. How can I do something like this using databricks asset bundles?
I recently ran into a similar issue, albeit with notebook tasks, and came to the following resolution, adapted to your example file structure:
In your databricks.yml file, pass an argument to your script via parameters:
databricks.yml
main.py
Caveats: