While searching best Serialization techniques for apache-spark I found below link
https://github.com/scala/pickling#scalapickling
which states Serialization in scala will be more faster and automatic with this framework.
And as Scala Pickling has following advantages. (Ref - https://github.com/scala/pickling#what-makes-it-different)
So, I wanted to know whether this Scala Pickling (PickleSerializer) can be used in apache-spark instead of KryoSerializer.
- If yes what are the necessary changes is to be done. (Example would be helpful)
- If No why not. (Please explain)
Thanks in advance. And forgive me if I am wrong.
Note : I am using scala language to code apache-spark (Version. 1.4.1) application.
I visited Databricks for a couple of months in 2014 to try and incorporate a
PicklingSerializerinto Spark somehow, but couldn't find a way to include type information needed by scala/pickling into Spark without changing interfaces in Spark. At the time, it was a no-go to change interfaces in Spark. E.g., RDDs would need to includePickler[T]type information into its interface in order for the generation mechanism in scala/pickling to kick in.All of that changed though with Spark 2.0.0. If you use
Datasets orDataFrames, you get so-calledEncoders. This is even more specialized than scala/pickling.Use
Datasets in Spark 2.x. It's much more performant on the serialization front than plain RDDs