I am having a hard time understanding how we can both update the machine learning model and use it to make predictions in one spark streaming job.
This code is from Spark StreamingLinearRegressionExample class
val trainingData = ssc.textFileStream(args(0)).map(LabeledPoint.parse).cache()
val testData = ssc.textFileStream(args(1)).map(LabeledPoint.parse)
val numFeatures = 3
val model = new StreamingLinearRegressionWithSGD()
.setInitialWeights(Vectors.zeros(numFeatures))
model.trainOn(trainingData)
model.predictOnValues(testData.map(lp => (lp.label, lp.features))).print()
The variable model
is being updated in one stream and used to predict in another stream. I do not understand how the changes were done to the model
in the trainOn
method are reflected in predictOnValues
.