What data reconciliation techniques are available for validating Debezium CDC streams?

1.9k views Asked by btiernay At 11 April 2022 at 14:23

I've been trying to find online documentation or blogs on approaches to validate end-to-end CDC capturing completeness, aka "Data Reconciliation". At my company we are using both Debezium for PG and Mongo to capture change streams and replicate them to our Snowflake DWH via Kafka. Are there specific techniques to be able to verify that a WAL or oplog maps 100% to captured events? Maybe by exposing primitives to count / checksum WAL / oplog operations as metrics / metadata fields to be compared against change event counts? While there are a couple of offerings that purport to help with this (e.g. BryteFlow, Redgate) I'm curious to learn if there are custom or open source approaches to this problem, and if there are any online resources I may have missed.

As an aside, I'm quite surprised that this is discussed more in blogs and on the web given how crucial it is in having confidence in replication streams. I've only had limited success, only finding the following resources:

Original Q&A

TechQA.

What data reconciliation techniques are available for validating Debezium CDC streams?

There are 0 answers

Related Questions in APACHE-KAFKA

Related Questions in CHECKSUM

Related Questions in DEBEZIUM

Related Questions in CDC

Related Questions in DATABASE-RECONCILIATION

Popular Questions

Popular Tags

Trending Questions