Is there any way to push down a filter when running "dbutils.fs.ls" in a Databricks notebook?

Question

Is there any way to push down a filter when running "dbutils.fs.ls" in a Databricks notebook?

214 views Asked by Mohammad At 09 December 2023 at 06:35

I have a container in an Azure blob storage that contains around 10,000,000 CSV and Zip files.

I want to use "dbutils.fs.ls" in a Databricks notebook to get a list of files. However, after running the command and waiting for more than 30 minutes, I got the below error:

The spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.

I have used a multi-node cluster:

Driver: Standard_D4ds_v5
Worker: Standard_D4ds_v5
Min Workers: 2
Max Workers: 8

It seems that the cluster cannot handle getting the list of all files. I was wondering if I could push down a filter on filenames and get the list of files after filtering. I am interested in files starting with "Energy". In this way, it might be possible to get a list of desired files without the above error.

Original Q&A

There are 1 answers

**Kombajn zbożowy** · Accepted Answer · 2023-12-10T20:55:27+00:00

Kombajn zbożowy On 10 December 2023 at 20:55 BEST ANSWER

Use Azure SDK instead:

for blob in container_client.list_blobs(name_starts_with="Energy"):
  ...

list_blobs can filter results, moreover it returns a generator instead of materializing all results upfront.

TechQA.

Is there any way to push down a filter when running "dbutils.fs.ls" in a Databricks notebook?

There are 1 answers

Related Questions in FILTER

Related Questions in AZURE-BLOB-STORAGE

Related Questions in DATABRICKS

Related Questions in DBUTILS

Popular Questions

Trending Questions