How to read csv files in dbfs using Spark SQL only?

67 views Asked by At

I am working on databricks interactive cluster and I want to query csv files stored in ADLS or DBFS using SQL.

Issue:

I can easily do that for delta table and parquet files.However, when I do this for csv with specific delimiters I get all the columns loaded as one column.

%sql SELECT *
from delta.`dbfs:/path/to/delta/table`

^^ displays output correctly

%sql SELECT *
from csv.`dbfs:/path/to/file.csv`

^^ displays output as one column without headers. Basically it needs some options passed to specify sep and headers

Potential Solutions:

I can easily do this in spark, spark.read.csv("dbfs:/path/to/file.csv", header=True, sep="|") but how can I do this in Spark SQL (magic command)?

I can create a table specifying options

%sql CREATE TABLE schema.table
USING CSV
OPTIONS (path "dbfs:/path/to/file.csv", header "true", delimiter "|")

and then I can query the data.

But is there a way to just select * the csv data without creating these intermediate tables?

1

There are 1 answers

0
DileeprajnarayanThumula On

Directly executing an SQL query on a CSV file without using spark.read is not supported in Databricks.

To access a file stored in DBFS (Databricks File System) using Databricks SQL, you first need to create a table or view that references the file. This process involves copying the data from the CSV file into the table or view. Once the table or view is created, you can query it to retrieve the data from the file.

I have tried your approach I am able to read the data from the CSV using the header OPTIONS (header "true", delimiter ",")

But I am getting an ERROR %sql SELECT * FROM csv.dbfs:/FileStore/tables/sample.csv

_c0 _c1 _c2
Name    Age Country
Emma    28  Australia
Name    Age Country
Alice   30  Canada
Name    Age Country
Michael 35  UK
Name    Age Country
John    25  USA

I have tried the below approach

%sql
CREATE OR REPLACE  TEMPORARY VIEW  sample_view
USING csv
OPTIONS (path  "dbfs:/FileStore/tables/sample.csv", header "true", inferSchema "true", delimiter ",")

Results:

Name    Age Country
Emma    28  Australia
Alice   30  Canada
Michaeference:el    35  UK
John    25  USA

Reference: Databricks - Read CSV file from folder