Parsing Protobuf ByteString in Spark not working after creating Encoder

Question

Parsing Protobuf ByteString in Spark not working after creating Encoder

2.1k views Asked by ukbaz At 15 October 2019 at 09:26

I'm trying to parse protobuf (protobuf3) data in spark 2.4 and I'm having some trouble with the ByteString type. I've created the case class using the ScalaPB library and loaded the jar into a spark shell. I've also tried creating a implicit encoder for the type however I still get the following error;

java.lang.UnsupportedOperationException: No Encoder found for com.google.protobuf.ByteString

Here is what I've tried so far;

import proto.Event._ // my proto case class
import org.apache.spark.sql.Encoder
import org.apache.spark.sql.Encoders.kryo

// Register our UDTs to avoid "<none> is not a term" error:
EventProtoUdt.register()

val inputFile = "data.avro"

object ByteStringEncoder{ 
  implicit def byteStringEncoder: Encoder[com.google.protobuf.ByteString] = org.apache.spark.sql.Encoders.kryo[com.google.protobuf.ByteString] 
}

import ByteStringEncoder._
import spark.implicits._

def parseLine(s: String): Event= Event.parseFrom(org.apache.commons.codec.binary.Base64.decodeBase64(s))

import scalapb.spark._
val eventsDf = spark.read.format("avro").load(inputFile)

val eventsDf2 = eventsDf .map(row => row.getAs[Array[Byte]]("Body")).map(Event.parseFrom(_))

Any help is appreciated

Original Q&A

There are 1 answers

**thesamet** · Answer 1 · 2019-11-04T04:28:32+00:00

thesamet On 04 November 2019 at 04:28

This issue has been fixed in sparksql-scalapb 0.9.0. Please see the updated documentation on setting the imports so an Encoder for ByteString is picked up by implicit search.

TechQA.

Parsing Protobuf ByteString in Spark not working after creating Encoder

There are 1 answers

Related Questions in SCALA

Related Questions in APACHE-SPARK

Related Questions in PROTOCOL-BUFFERS

Related Questions in SCALAPB

Related Questions in APACHE-SPARK-ENCODERS

Popular Questions

Popular Tags

Trending Questions