Azure Databricks - How to dynamically read the new file from the mount point?

Question

Azure Databricks - How to dynamically read the new file from the mount point?

336 views Asked by Hari At 12 October 2023 at 11:14

i have below mount point created

dbutils.fs.ls("/mnt/mount/raw2")

and have below file available in the above mount point

customer1.csv
| ID|  Name| Place|
+---+------+------+
|101|  Hari|   Tcr|
|102|  John|   Bgr|

now i am reading the file into the dataframe

df = spark.read.option("header", "true").csv("/mnt/mount/raw2")

this read the customer1.csv file

now i am writing this into the delta table

df.write.format("delta").mode("overwrite").save("/mnt/mount/raw2/customer_data")

then i recieve new data in the mount point

customer2.csv with 
103|Stefen|   Hyd|
|104| Devid|   Bgr|
|105| Wager|London|

now i want to append the data into the same delta location - customer_data

so what is the best way to read the new arrived file from the mount location dynamically from the same mount point

so i am looking for scenario as below

existing_delta_table =
| ID|  Name| Place|
+---+------+------+
|101|  Hari|   Tcr|
|102|  John|   Bgr|

new arrived file 
df = spark.read.option("header", "true").csv("/mnt/mount/raw2")

if df is new_file:
   existing_delta_table.append(df_records)

Original Q&A

There are 1 answers

**JayashankarGS** · Answer 1 · 2023-10-12T12:10:38+00:00

You can use Auto loader concept. Follow below steps.

I have used cutomer1.csv.

enter image description here

Next, run below code.

df = spark.readStream.format("cloudFiles").option("cloudFiles.format", "csv")\
.option("cloudFiles.schemaLocation", "/checkpoint_path/").load("/mnt/mount/raw2/")

df.writeStream.option("checkpointLocation", "/checkpointLocation/").format("delta").trigger(availableNow=True)\
  .start("/mnt/mount/raw2_customer_data").awaitTermination()

Here you need to give delta table path different from source.

enter image description here

Now, the data in delta table is.

enter image description here

Next, adding cutomer2.csv now.

enter image description here

Again, run above code. Which checks for new file and update the records.

enter image description here

and in table.

enter image description here

TechQA.

Azure Databricks - How to dynamically read the new file from the mount point?

There are 1 answers

Related Questions in PYSPARK

Related Questions in DATABRICKS

Related Questions in AZURE-DATABRICKS

Related Questions in DELTA-LAKE

Related Questions in DELTA-LIVE-TABLES

Popular Questions

Popular Tags

Trending Questions