Reading Data Incrementally from S3 in Delta Format

Question

Reading Data Incrementally from S3 in Delta Format

146 views Asked by agaonsindhe At 13 December 2023 at 13:12

I am working on a project where the data is stored in Delta format on Amazon S3, and I need to read this data incrementally. I am encountering challenges in implementing this, and I would appreciate guidance or insights from the community. My current approach is to leverage transaction json metadata to look for any information regarding modified data on the location.

What I've Tried:

Delta Lake Documentation: I have referred to the Delta Lake documentation, to understand the best practices for reading data incrementally. However there is no concrete information regarding storing Delta format data on S3 or any other files sources although there is a lot about Delta SQL.

I expect to retrieve data incrementally from the S3 location in Delta format. Ideally, I would like suggestions on implementing this scenario.

Environment Details:

Delta Lake Version: 1.0.0 AWS SDK/Library Version: 1.11.375 Programming Language: Java Spark Version - 3.1.2

Original Q&A

There are 1 answers

**Kashyap** · Answer 1 · 2023-12-14T15:37:44+00:00

You can either use:

Change Data Feeds on Delta tables or

CREATE TABLE student (id INT, name STRING, age INT) TBLPROPERTIES (delta.enableChangeDataFeed = true)

and

val df = spark.read.format("delta")
  .option("readChangeFeed", "true")
  .option("startingVersion", 0)
  .table("student")

Spark Structured Streaming

val df = spark.readStream.format("delta")
  .load("/tmp/delta/events")

import io.delta.implicits._
val df = spark.readStream.delta("/tmp/delta/events")

TechQA.

Reading Data Incrementally from S3 in Delta Format

There are 1 answers

Related Questions in JAVA

Related Questions in APACHE-SPARK

Related Questions in AMAZON-S3

Related Questions in DELTA-LAKE

Related Questions in INCREMENTAL

Popular Questions

Popular Tags

Trending Questions