Azure Databricks - How to dynamically read the new file from the mount point?

322 views Asked by At

i have below mount point created

dbutils.fs.ls("/mnt/mount/raw2")

and have below file available in the above mount point

customer1.csv
| ID|  Name| Place|
+---+------+------+
|101|  Hari|   Tcr|
|102|  John|   Bgr|

now i am reading the file into the dataframe

df = spark.read.option("header", "true").csv("/mnt/mount/raw2")

this read the customer1.csv file

now i am writing this into the delta table

df.write.format("delta").mode("overwrite").save("/mnt/mount/raw2/customer_data")

then i recieve new data in the mount point

customer2.csv with 
103|Stefen|   Hyd|
|104| Devid|   Bgr|
|105| Wager|London|

now i want to append the data into the same delta location - customer_data

so what is the best way to read the new arrived file from the mount location dynamically from the same mount point

so i am looking for scenario as below

existing_delta_table =
| ID|  Name| Place|
+---+------+------+
|101|  Hari|   Tcr|
|102|  John|   Bgr|

new arrived file 
df = spark.read.option("header", "true").csv("/mnt/mount/raw2")

if df is new_file:
   existing_delta_table.append(df_records)
1

There are 1 answers

4
JayashankarGS On

You can use Auto loader concept. Follow below steps.

I have used cutomer1.csv.

enter image description here

Next, run below code.

df = spark.readStream.format("cloudFiles").option("cloudFiles.format", "csv")\
.option("cloudFiles.schemaLocation", "/checkpoint_path/").load("/mnt/mount/raw2/")

df.writeStream.option("checkpointLocation", "/checkpointLocation/").format("delta").trigger(availableNow=True)\
  .start("/mnt/mount/raw2_customer_data").awaitTermination()

Here you need to give delta table path different from source.

enter image description here

Now, the data in delta table is.

enter image description here

Next, adding cutomer2.csv now.

enter image description here

Again, run above code. Which checks for new file and update the records.

enter image description here

and in table.

enter image description here