How do I compress with commons compress in memory?

2.3k views Asked by At

I am trying to bzip2 data in memory using commons compress. I am trying this:

private static final int bufferSize = 8192;

public void compress(
    ByteArrayInputStream byteArrayInputStream,
    CompressorOutputStream compressorOutputStream) throws IOException {
    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
    final byte[] buffer = new byte[bufferSize];
    int n = 0;
    while (-1 != (n = byteArrayInputStream.read(buffer)))
        compressorOutputStream.write(buffer, 0, n);
}

public byte[] compressBZIP2(byte[] inputBytes) throws Exception {
    ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(inputBytes);
    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
    BZip2CompressorOutputStream bZip2CompressorOutputStream = new BZip2CompressorOutputStream(byteArrayOutputStream);
    compress(byteArrayInputStream, bZip2CompressorOutputStream);
    return byteArrayOutputStream.toByteArray();
}

But this doesnt work:

byte[] bzipCompressed = resultCompressor.compressBZIP2(contentBytes);

The result always has 3 bytes, and that's all.

What am I doing wrong?

3

There are 3 answers

0
Stefan Bodewig On

You never close the BZip2CompressorOutputStream which means the final (and likely only) block of data will never get written to the wrapped stream.

0
prayagupa On

I was having same problem with "apache commons compress", bzip2 was only writing 3 bytes. I ended up replacing ByteArrayOutputStream with FileOutputStream.

Example on java 12:

import org.apache.commons.compress.compressors.CompressorOutputStream;
import org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream;
import org.apache.commons.compress.compressors.bzip2.BZip2CompressorOutputStream;

import java.io.FileInputStream;
import java.io.FileOutputStream;

import java.nio.charset.StandardCharsets;

import java.util.Base64;

public static String compress(String data) throws IOException {
        var bzip2TempFile = new File("/tmp/compressed.bzip2");
        var outputStream = new FileOutputStream(bzip2TempFile);

        CompressorOutputStream burrowZip2OutputStream = 
                new BZip2CompressorOutputStream(outputStream, 9);
        burrowZip2OutputStream.write(data.getBytes(StandardCharsets.UTF_8));
        burrowZip2OutputStream.close();

        try (var is = new FileInputStream(bzip2TempFile)) {
            var result = new String(Base64.getEncoder().encode(is.readAllBytes()));
            bzip2TempFile.deleteOnExit();
            return result;
        }
}

testing:

input: pirem
bzip2: BZh91AY&SYZF???"P 0???P??H?

H?@
base64 encoded: QlpoOTFBWSZTWRhaRoIAAAGBgAIiUAAgADDNAMGgUOLuSKcKEgMLSNBA
0
Grégoire C On

For those interested in using Apache Commons Compress, but looking for the in-memory bunzip2, here is a tested implementation:

import org.apache.commons.compress.compressors.CompressorInputStream;
import org.apache.commons.compress.compressors.bzip2.BZip2CompressorInputStream;

(..)

private static final int bufferSize = 8192;

private void uncompress(CompressorInputStream compressorInputStream, 
        ByteArrayOutputStream byteArrayOutputStream) throws IOException {
    final byte[] buffer = new byte[bufferSize];
    int n = 0;
    while (-1 != (n = compressorInputStream.read(buffer))) {
        byteArrayOutputStream.write(buffer, 0, n);
    }
    compressorInputStream.close();
    byteArrayOutputStream.close();
}

public ByteArrayOutputStream bunzip2(FileInputStream inputStream) throws IOException {
    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
    BZip2CompressorInputStream bZip2CompressorInputStream = new BZip2CompressorInputStream(inputStream);
    uncompress(bZip2CompressorInputStream, byteArrayOutputStream);
    return byteArrayOutputStream;
}

Hope this helps someone!