Handle new data in "Raw" zone of Lakehouse

72 views Asked by At

So I understand there are 3 storage layers in Data Lakehouse (Delta lake) architecture: Raw, Enriched, Curated. Raw storage seems to store all the data as they come in, in their native format - say "parquet" format. Then data gets cleaned and get some structure in Enriched store.

My question is: When new batch of data comes in (say there are new records, as well as there are some existing updated records), how can we keep the "Raw" storage up to date? Ideally we append the new records, and update existing ones...but the "parquet" format in the "Raw" store is immutable. How can we achieve this pushing new data, in incremental manner, to the Raw store?

0

There are 0 answers