How to set encoder for Spark dataset when importing csv or txt file

156 views Asked by At

I'm having an issue with this part of the Spark Mllib code from the docs (https://spark.apache.org/docs/latest/ml-collaborative-filtering.html), using either csv or txt files:

val ratings = 
 spark.read.textFile("data/mllib/als/sample_movielens_ratings.txt")
  .map(parseRating)
  .toDF()

I get the following error:

Error:(31, 11) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.

.map(parseRating)
      ^

I have also have the following at the start of my object:

val conf = new 
SparkConf().setMaster("local[*]").set("spark.executor.memory", "2g") 
val spark = 
SparkSession.builder.appName("Mlibreco").config(conf).getOrCreate()
import spark.implicits._

It seems that the read.textFile method needs an encoder. I have found a few articles on how to set the encoder. However, I don't know how to implement it when importing the csv or txt file. Given that nothing about encoders is mentioned in the docs, there is also very likely that I have missed something obvious.

1

There are 1 answers

0
user10089632 On

Try this

val sparkSession: SparkSession = ***
import sparkSession.implicits._
val dataset = sparkSession.createDataset(dataList)

and see this link to find one of the predefined encoder. Here