Whether we can update existing model in spark-ml/spark-mllib?

1.4k views Asked by At

We are using spark-ml to build the model from existing data. New data comes on daily basis.

Is there a way that we can only read the new data and update the existing model without having to read all the data and retrain every time?

2

There are 2 answers

0
Florent Moiny On BEST ANSWER

It depends on the model you're using but for some Spark does exactly what you want. You can look at StreamingKMeans, StreamingLinearRegressionWithSGD, StreamingLogisticRegressionWithSGD and more broadly StreamingLinearAlgorithm.

0
mathieu On

To complete Florent's answer, if you are not in a streaming context, some Spark mllib models support an initialModel as a starting point for incremental updates. See KMeans, or GMM for instance.