I get a DataFrame contians Tuple(String, org.apache.spark.ml.regression.LinearRegressionModel)
:
val result = rows.map(row => {
val userid = row.getString(0)
val frame = filterByUserId(userid ,dataFrame)
(userid, lr.fit(frame, "topicDistribution", "s"))
}).toDF()
When I use foreach
function, I get this error.
result.foreach(row => {
val model = row.getAs[LinearRegressionModel](1)
val userid = row.getString(0)
model.save(SocialTextTest.userModelPath + userid)
})
Exception in thread "main" java.lang.UnsupportedOperationException:
No Encoder found for org.apache.spark.ml.regression.LinearRegressionModel
- field (class: "org.apache.spark.ml.regression.LinearRegressionModel", name: "_2")
- root class: "scala.Tuple2"
Should I write a Encoder by myself?
The issue is for a reason.
The code simply does not make much sense once you get the gist of what Dataset data abstraction really is and the purpose of Encoders.
In essence, prepare your dataset first (as a collection of transformations on Dataset) and only when the dataset is ready train a model (aka fit a model). The model will then be outside the realm of Dataset and you won't see the exception.
The reason the exception happens when you call
foreach
is that that's when you trigger a computation and Spark tries to execute the code.Oh, no. Rewrite the code following the guide at Machine Learning Library (MLlib) Guide and reviewing some examples to learn how to use the API.