why is spark streaming called near real time?

1.7k views Asked by At

I know that spark streaming uses micro batches to process the data, but the processing is done in less than a second in some cases. My question is "Can't it be called pure real time processing rather than near real time processing in that senario?"

2

There are 2 answers

1
Juan On BEST ANSWER

I'd say that we can only talk about real-time for metrics, alerts and optimization when data is gathered and directly pushed to a dashboard or system, without any kind of ETL process, the purpose of real time is, mainly, the speed.

Whenever there is a process with batches that extracts historical trending or benchmarking, despite it takes less than a seccond, then is not real-time but is close to it, that's because they talk about near real time.

So, to answer your question, I'd say that no, is near real time because you are batching and processing.

I hope it helps.

Juan

0
vaquar khan On

Spark Streaming divides the data stream into batches of X seconds called Dstreams, which internally is a sequence of RDDs, one for each batch interval. Each RDD contains the records received during the batch interval.since its process in small batches called near real time not real time.