How can I get my actual bytes that I used to make a big byte array?

144 views Asked by At

I have a method which makes one byte array as per below format.

  • First it gets avroBytes.
  • Then it snappy compresses it.
  • Then it makes another byte array with particular format as shown below.

Below is the method:

  public static byte[] serialize(final Record record, final int clientId,
      final Map<String, String> holderMap) throws IOException {
    byte[] avroBytes = getAvroBytes(holderMap, record);
    byte[] snappyCompressed = Snappy.compress(avroBytes);

    int size = (2+8+4) + snappyCompressed.length;

    ByteBuffer buffer = ByteBuffer.allocate(size);
    buffer.order(ByteOrder.BIG_ENDIAN);
    buffer.putShort((short) clientId);
    buffer.putLong(System.currentTimeMillis());
    buffer.putInt(snappyCompressed.length);
    buffer.put(snappyCompressed);
    buffer.rewind();

    byte[] bytesToStore = new byte[size];
    buffer.get(bytesToStore);

    return bytesToStore;
  }

Now I want to get my actual avroBytes once I have bytesToStore

byte[] bytesToStore = serialize(......);
// now how can I get actual `avroBytes` using bytesToStore?

Is there any way to get it back?

2

There are 2 answers

2
clstrfsck On BEST ANSWER

Based on the code, the compressed version starts at bytesToStore[14], so one simple, but not necessarily most efficient way would be to make a copy of the bytes from that location, and call Snappy.uncompress(bytes).

Something like this:

public static int HEADER_SIZE = 2 + 8 + 4;

public static byte[] extractAvroBytes(byte[] bytesToStore) throws IOException {
    byte[] bytes = Arrays.copyOfRange(bytesToStore, HEADER_SIZE, bytesToStore.length);
    return Snappy.uncompress(bytes);
}

I haven't tested this, so some tweaking may be required.

Depending on the Java interface to snappy that you are using, there may be methods available to decompress data directly from the serialized bytes without making an intermediate copy.

0
Darshan Mehta On

From the code, it looks like there is already a method that returns avroBytes, e.g.:

byte[] avroBytes = getAvroBytes(holderMap, record);

This method needs holderMap and record as aguments, and looking at the code where serialize is called, you already have those two values. So, if possible, you can call getAvroBytes before calling serialize and pass it as an argument to serialize method.