Databricks "dbutils": run multiple notebooks in one notebooks

1.2k views Asked by At

I have multiple notebooks in my Databricks to run and a master file to run all notebooks in an order in one file to get the final result. However, I am having issues and showing it has limitations. Someone, it might be able to do by automate workflow run id. I know about Databricks workflow but I want to know if it can be done by Databricks dbutils.

I tried to get to run all the notebooks in one notebooks and getting not possible because of limitations

1

There are 1 answers

0
Rakesh Govindula On

If you want to do it in Scala, you can try this as mentioned in comments by @Kashyap.

If you want to do it in python, you can try the below methods.

Create list of notebooks to be executed with their parameters and loop them with dbutils.notebook.run command.

Sample:

N_list=["Ch1","Ch2"]
for i in N_list:
    print(dbutils.notebook.run(i,30))

enter image description here

But this executes the notebooks in a sequential manner.

You can use ADF pipeline with a ForEach activity(Uncheck Sequencial in it) with databricks API

https://adb-3009531816291561.1.azuredatabricks.net/api/2.0/workspace/list?path=<Notebook folder path>.

  • First Get the list of Notebooks in the workspace using web activity.
  • Then use filter to filter your parent Notebook from it. Now pass it to ForEach activity.
  • Inside ForEach activity, use the Notebook activity and pass the path of the Notebook to it.

Web activity:

enter image description here

Web activity Result:

enter image description here

Then use filter activity to filter the parent Notebook(Use name your Notebook to filter it).

@not(contains(item().path,'Nb1'))

enter image description here

give the output array to ForEach activity and make sure you uncheck the Sequential checkbox.

enter image description here

Inside ForEach, give the path of the Notebook.

enter image description here

If you want to store the output of every Notebook run, and use it in a databricks Notebook(Parent Notebook), use an append Variable after this inside ForEach and store the output exited from Notebook in that. You will get array of outputs of each Notebook run.

Now, outside ForEach, use another Notebook activity to call your parent Notebook and pass this array to as a parameter.