1) I am trying to use MLlib Random Forest . my final output should have 2 columns
id, predicted_value
1, 0.5
2, 0.4
my feature sets are training data and scoring --- train , score but when I train and score I drop the id field as it could not be used as feature as it is unique for each row and has no intelligence in predicting, now I get the score predicted
my scored output looks like
predicted_value
0.5
0.4
But I want to tie it back to id
I am having id field in separate DStream and predicted_value in separate DStream. How to I bind it to each other, I don't have any column field to make a join.
now how do I tie it back . For example R has a function cbind which can bind 2 columns from different data frames
x<-data.frame(cbind(testIds,p$p1))
Is it possible or any alternatives?
2) I am using a MLlib randomforest model to predict using spark streaming. In the end, I want to combine the feature Dstream & prediction Dstream together for further downstream processing. How can I do that?
Thanks in advance.
You can use
DStream.transform
and predict: