How do I create a dictionary object for passing to Zstd.compress?

4.2k views Asked by At

I am using Zstd compression in Java for compressing a large JSON payload. I am using methods from the zstd-jni library for Java. I create a byte array out of the JSON string and use this method.

public static byte[] compress(byte[] var0, int var1)

I read that ZSTD will give more optimal results when a dictionary is passed during compression and decompression. How do I create a ZstdDictCompress object? What byte array and integer should I pass to the constructor?

public static long compress(byte[] var0, byte[] var1, ZstdDictCompress var2)

1

There are 1 answers

1
koigor On

This example is for https://github.com/luben/zstd-jni.

First of all you need to get many samples of your jsons. You shouldn't use just one or couple samples. After that you can train your dictionary:

List<String> jsons = ...; // List of your jsons samples

ZstdDictTrainer trainer = new ZstdDictTrainer(1024 * 1024, 16 * 1024); // 16 KB dictionary

for(String json : jsons) {
    trainer.addSample(json.getBytes(StandardCharsets.UTF_8));
}

byte[] dictionary = trainer.trainSamples();

Now you have you dictionary in byte array.

Next step is using SAME dictionary to compress and decompress.

// Compress
byte[] json = jsonString.getBytes(StandardCharsets.UTF_8);
ZstdDictCompress zstdDictCompress = new ZstdDictCompress(dictionary, Zstd.defaultCompressionLevel());
byte[] compressed = Zstd.compress(json, zstdDictCompress);

// Tricky moment, you have to pass json full length to decompress method
int jsonFullLength = json.length;

// Decompress
ZstdDictDecompress zstdDictDecompress = new ZstdDictDecompress(dictionary);
byte[] decompressed = Zstd.decompress(compressed, zstdDictDecompress, jsonFullLength);
String jsonString2 = new String(decompressed, StandardCharsets.UTF_8);

That's all!