I'm trying to parse protobuf (protobuf3) data in spark 2.4 and I'm having some trouble with the ByteString type. I've created the case class using the ScalaPB library and loaded the jar into a spark shell. I've also tried creating a implicit encoder for the type however I still get the following error;
java.lang.UnsupportedOperationException: No Encoder found for com.google.protobuf.ByteString
Here is what I've tried so far;
import proto.Event._ // my proto case class
import org.apache.spark.sql.Encoder
import org.apache.spark.sql.Encoders.kryo
// Register our UDTs to avoid "<none> is not a term" error:
EventProtoUdt.register()
val inputFile = "data.avro"
object ByteStringEncoder{
implicit def byteStringEncoder: Encoder[com.google.protobuf.ByteString] = org.apache.spark.sql.Encoders.kryo[com.google.protobuf.ByteString]
}
import ByteStringEncoder._
import spark.implicits._
def parseLine(s: String): Event= Event.parseFrom(org.apache.commons.codec.binary.Base64.decodeBase64(s))
import scalapb.spark._
val eventsDf = spark.read.format("avro").load(inputFile)
val eventsDf2 = eventsDf .map(row => row.getAs[Array[Byte]]("Body")).map(Event.parseFrom(_))
Any help is appreciated
This issue has been fixed in sparksql-scalapb 0.9.0. Please see the updated documentation on setting the imports so an
Encoder
for ByteString is picked up by implicit search.