why to use kafka to store cdc data instead directly consume by spark?

Question

why to use kafka to store cdc data instead directly consume by spark?

242 views Asked by Jay At 14 December 2020 at 15:12

I want to consume CDC data from multiple data sources for example cassandra, mysql, Oracle ...etc . I have gone through some documentation to stream cdc data to kafka and store data into topics . I was thinking can't I write spark programs to consume data directly from source , instead first pushing data into kafka topics and then spark program connecting to kafka topics to consume message further . Here are my few questions , I am trying to figure out answer:

what is importance of using kafka in between instead directly consuming changed records from spark ?
using kafka in mid won't add some latency to system ?

Original Q&A

There are 1 answers

**Erick Ramirez** · Answer 1 · 2020-12-15T04:54:09+00:00

You certainly can write your own Spark apps that can consume the data but doing so feels like you're reinventing the wheel. Kafka is solving this for you so you don't have to.

In addition, Kafka supports taking input from various sources as well publishing the data to multiple subscribers including Spark apps.

With Kafka, it's easier to build apps since there are connectors available for most technologies. Cheers!

TechQA.

why to use kafka to store cdc data instead directly consume by spark?

There are 1 answers

Related Questions in APACHE-SPARK

Related Questions in APACHE-KAFKA

Related Questions in CASSANDRA

Related Questions in ORACLE-CDC

Popular Questions

Popular Tags

Trending Questions