I am working on a project where the data is stored in Delta format on Amazon S3, and I need to read this data incrementally. I am encountering challenges in implementing this, and I would appreciate guidance or insights from the community. My current approach is to leverage transaction json metadata to look for any information regarding modified data on the location.
What I've Tried:
Delta Lake Documentation: I have referred to the Delta Lake documentation, to understand the best practices for reading data incrementally. However there is no concrete information regarding storing Delta format data on S3 or any other files sources although there is a lot about Delta SQL.
I expect to retrieve data incrementally from the S3 location in Delta format. Ideally, I would like suggestions on implementing this scenario.
Environment Details:
Delta Lake Version: 1.0.0 AWS SDK/Library Version: 1.11.375 Programming Language: Java Spark Version - 3.1.2
You can either use:
and