How to set encoder for Spark dataset when importing csv or txt file

Question

How to set encoder for Spark dataset when importing csv or txt file

167 views Asked by Christopher Mills At 01 September 2017 at 20:46

I'm having an issue with this part of the Spark Mllib code from the docs (https://spark.apache.org/docs/latest/ml-collaborative-filtering.html), using either csv or txt files:

val ratings = 
 spark.read.textFile("data/mllib/als/sample_movielens_ratings.txt")
  .map(parseRating)
  .toDF()

I get the following error:

Error:(31, 11) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.

.map(parseRating)
      ^

I have also have the following at the start of my object:

val conf = new 
SparkConf().setMaster("local[*]").set("spark.executor.memory", "2g") 
val spark = 
SparkSession.builder.appName("Mlibreco").config(conf).getOrCreate()
import spark.implicits._

It seems that the read.textFile method needs an encoder. I have found a few articles on how to set the encoder. However, I don't know how to implement it when importing the csv or txt file. Given that nothing about encoders is mentioned in the docs, there is also very likely that I have missed something obvious.

Original Q&A

There are 1 answers

**user10089632** · Answer 1 · 2017-09-01T20:54:07+00:00

user10089632 On 01 September 2017 at 20:54

Try this

val sparkSession: SparkSession = ***
import sparkSession.implicits._
val dataset = sparkSession.createDataset(dataList)

and see this link to find one of the predefined encoder. Here

TechQA.

How to set encoder for Spark dataset when importing csv or txt file

There are 1 answers

Related Questions in CSV

Related Questions in APACHE-SPARK-MLLIB

Related Questions in APACHE-SPARK-DATASET

Related Questions in IMPLICITS

Popular Questions

Popular Tags

Trending Questions