Scala - Spark Dstream operation similar to Cbind in R

Question

Scala - Spark Dstream operation similar to Cbind in R

195 views Asked by user3757805 At 07 January 2017 at 00:13

1) I am trying to use MLlib Random Forest . my final output should have 2 columns

id, predicted_value 
1,  0.5 
2,  0.4

my feature sets are training data and scoring --- train , score but when I train and score I drop the id field as it could not be used as feature as it is unique for each row and has no intelligence in predicting, now I get the score predicted

my scored output looks like

predicted_value 
0.5 
0.4

But I want to tie it back to id

I am having id field in separate DStream and predicted_value in separate DStream. How to I bind it to each other, I don't have any column field to make a join.

now how do I tie it back . For example R has a function cbind which can bind 2 columns from different data frames

x<-data.frame(cbind(testIds,p$p1))

Is it possible or any alternatives?

2) I am using a MLlib randomforest model to predict using spark streaming. In the end, I want to combine the feature Dstream & prediction Dstream together for further downstream processing. How can I do that?

Thanks in advance.

Original Q&A

There are 1 answers

**user7735111** · Answer 1 · 2017-03-19T12:36:06+00:00

user7735111 On 19 March 2017 at 12:36

You can use DStream.transform and predict:

 dstream.transform(rdd =>  {
   val predictions = model.predict(rdd)
   rdd.zip(predictions)
 })

TechQA.

Scala - Spark Dstream operation similar to Cbind in R

There are 1 answers

Related Questions in SCALA

Related Questions in APACHE-SPARK

Related Questions in DSTREAM

Popular Questions

Popular Tags

Trending Questions