Extra bytes being added to serialization with BooPickle and RocksDb

198 views Asked by At

So I'm using BooPickle to serialize Scala classes before writing them to RocksDB. To serialize a class,

case class Key(a: Long, b: Int) {

def toStringEncoding: String = s"${a}-${b}"
}

I have this implicit class

  implicit class KeySerializer(key: Key) {
    def serialize: Array[Byte] = 
      Pickle.intoBytes(key.toStringEncoding).array
  }

The method toStringEncoding is necessary because BooPickle wasn't serializing the case class in a way that worked well with RocksDb's requirements on key ordering. I then write a bunch of key, value pairs to several SST files and ingest them into RocksDb. However when I go to look up the keys from the db, they're not found.

If I iterate over all of the keys in the db, I find keys are successfully written, however extra bytes are written to the byte representation in the db. For example if key.serialize outputs something like this

Array[Byte] = Array( 25, 49, 54, 48, 53, 55, 52, 52, 48, 48, 48, 45, 48, 45, 49, 54, 48, 53, 55, 52, 52, 48, 51, 48, 45, 48, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...)

What I'll find in the db is something like this

Array[Byte] = Array( 25, 49, 54, 48, 53, 55, 52, 52, 48, 48, 48, 45, 48, 45, 49, 54, 48, 53, 55, 52, 52, 48, 51, 48, 45, 48, 51, 101, 52, 97, 49, 100, 102, 48, 50, 53, 5, 8, ...)

Extra non zero bytes replace the zero bytes at the end of the byte array. In addition the size of the byte arrays are different. When I call the serialize method the size of the byte array is 512, but when I retrieve the key from the db the size is 4112. Anyone know what might be causing this?

2

There are 2 answers

0
simpadjo On BEST ANSWER

I have no experience with RocksDb or BooPickle but I guess that the problem is in calling ByteBuffer.array. It returns the whole array backing the byte buffer rather than the relevant part.

You can look e.g. here Gets byte array from a ByteBuffer in java how to properly extract the data from a ByteBuffer.

0
Levi Ramsey On

The BooPickle docs suggest the following for getting BooPickled data as a byte array:

val data: Array[Byte] = Array.ofDim[Byte](buf.remaining)
buf.get(data)

So in your case it would be something like

def serialize: Array[Byte] = {
  val buf = Pickle.intoBytes(key.toStringEncoding)
  val arr = Array.ofDim[Byte](buf.remaining)
  buf.get(arr)
  arr
}