How many RDD in the resulting DStream of reduceByKeyAndWindow

358 views Asked by At

I am currently working on a small spark job to compute stock correlation matrix from a DStream.

From a DStream[(time, quote)], I need to aggregate quotes (double) by time (Long) among multiple rdds, before computing correlations (considering all quotes of the rdds)

dstream.reduceByKeyAndWindow{./*aggregate quotes in Vectors*/..} 
       .forEachRDD {rdd => Statistics.corr(RDD[Vector])}

To my mind, this could be a solution if the resulting dstream (from reduceByKeyAndWindow) contains only 1 rdd with all aggregated quotes.

But I am not sure. How is the data distributed after reduceByKeyAndWindow? Is there a way to merge rdds in a dstream?

0

There are 0 answers