why is spark streaming called near real time?

Question

why is spark streaming called near real time?

1.7k views Asked by dalonlobo At 11 October 2017 at 04:30

I know that spark streaming uses micro batches to process the data, but the processing is done in less than a second in some cases. My question is "Can't it be called pure real time processing rather than near real time processing in that senario?"

Original Q&A

There are 2 answers

vaquar khan On 11 October 2017 at 04:59

Spark Streaming divides the data stream into batches of X seconds called Dstreams, which internally is a sequence of RDDs, one for each batch interval. Each RDD contains the records received during the batch interval.since its process in small batches called near real time not real time.

**Juan** · Accepted Answer · 2017-10-11T04:39:44+00:00

I'd say that we can only talk about real-time for metrics, alerts and optimization when data is gathered and directly pushed to a dashboard or system, without any kind of ETL process, the purpose of real time is, mainly, the speed.

Whenever there is a process with batches that extracts historical trending or benchmarking, despite it takes less than a seccond, then is not real-time but is close to it, that's because they talk about near real time.

So, to answer your question, I'd say that no, is near real time because you are batching and processing.

I hope it helps.

Juan

TechQA.

why is spark streaming called near real time?

There are 2 answers

Related Questions in PYSPARK

Related Questions in REAL-TIME

Related Questions in NEAR-REAL-TIME

Popular Questions

Trending Questions