Real time replication between PostgresSQL and Delta.io tables

70 views Asked by At

I am new in data engineering fields but experienced developer. I am currently estimating this use case:

We are looking for an open-source/free solution for the first POC.

What we have drow for now is:

PostgresSQL -> create WAL log -> readed by debezium postgres SQL connector -> Send to kafka -> Send to Spark Structured streaming -> Populate Delta.io tables

What the issue on that architecture is that Delta.io expect a strict schema validation: https://docs.delta.io/latest/delta-batch.html#schema-validation

We are expecting data change in the postgres sources such as column renaming and new column.

How can we make this schema changes automatic ? Is there is a tool in Sparks ou Apache Airflow that we can use that make this schema change automatically with a DML or make the good spark code to make the DDL ? (https://docs.delta.io/latest/delta-batch.html#update-table-schema)

Readed doc and looking for advise

0

There are 0 answers