I have two datasets: Dataset[User] and Dataset[Book] where both User and Book are case classes. I join them like this:
val joinDS = ds1.join(ds2, "userid")
If I try to map over each element in joinDS, the compiler complains that an encoder is missing:
not enough arguments for method map: (implicit evidence$46: org.apache.spark.sql.Encoder[Unit])org.apache.spark.sql.Dataset[Unit].
Unspecified value parameter evidence$46.
Unable to find encoder for type stored in a Dataset.
But the same error does not occur if I use foreach instead of map. Why doesn't foreach require an encoder as well? I have imported all implicits from the spark session already, so why does map require an encoder at all, when the dataset is a result of joining two datasets containing case classes)? Also, what type of dataset do I get from that join? Is it a Dataset[Row], or something else?
TL;DR
Encoderis required to transform the outcome to the internal Spark SQL format and there is no need for that in case offoreach(or any other sink).Just take a look at the signatures.
mapisso in plain words it transforms records from
TtoUand then uses theEncoderofUto transform the result to internal representation.foreachfrom the other hand, isIn other words it doesn't expect any result. Since there is no result to be stored,
Encoderis just obsolete.