How to load CSV file from an Azure blob storage using DuckDB?

466 views Asked by At

I have CSV files stored in an Azure Blob Storage container. I want to use DuckDB to load this data for analysis.

Can someone guide me through the process of loading a CSV file from an Azure Blob Storage container into DuckDB? I am looking for a Python-based solution or any relevant code examples.

1

There are 1 answers

0
Péter Szilvási On BEST ANSWER

First, you need to generate a SAS token for authentication. Navigate to your CSV file on Azure Portal, right-click on it, and select the Generate SAS option. The URL should look similar to this: https://<your_storage_account>.blob.core.windows.net/<your_container_name>/<your_csv_file>?<sas_token>

Use the blob URL with the SAS token in the read_csv_auto function that can directly read CSV data in DuckDB.

import duckdb

connection = duckdb.connect("azure_blob.db")
query = str.format('CREATE TABLE IF NOT EXISTS <table_name> AS SELECT * FROM read_csv_auto("<blob_sas_url>")')
connection.execute(query)
connection.execute('SELECT * FROM <table_name>')

It reads the CSV data directly from the Azure Blob Storage by providing the URL of the CSV file.

Note: You can also generate a SAS token using the az storage container generate-sas command.