Apache Nifi for MS SQL CDC using dynamic SQL query

1.8k views Asked by At

In our legacy architecture, we have a MS SQL server database, this database stores all the sensor information in near real time basis, on average per second it receives 100 records.In order to get complete information about the sensor events, we need to join 2 to 3 tables in the database.

Sample Query:

SELECT SOMETHING
FROM TABLE1 AS tab1 
INNER JOIN TABLE2 AS tab2 ON tab1.UpdateID=tab2.ID 
INNER JOIN TABLE3 as tab3 ON tab1.TagID=tab3.ID 
WHERE tab2.UpdateTime > ${lastExtractUnixTime}

Our requirement is to get the capture data change of above query every 1 minute and post records to Kafka.

Temporarily I am doing CDC using Spark Core JDBC, processing records, sending to Kafka and maintaining CDC information along with ${lastExtractUnixTime} into HBase as Phoenix table. Job is scheduled for every 1 minute batch interval.

As a long term solution, we are planning to use Apache Nifi for doing the CDC thing and post information to Kafka, Spark streaming will read messages from Kafka, will apply some business logic on it and will send the enriched data to the other Kafka topic; I don't find suitable processor, which will help me to dynamically pass the ${lastExtractUnixTime} in SQL and get the delta records every 1 or 2 minutes.

Please suggest how this can be accomplished using Apache Nifi.

0

There are 0 answers