How to create a multipart zip file and read it back?

6.7k views Asked by At

How would I properly zip bytes to a ByteArrayOutputStream and then read that using a ByteArrayInputStream? I have the following method:

private byte[] getZippedBytes(final String fileName, final byte[] input) throws Exception {
    ByteArrayOutputStream bos = new ByteArrayOutputStream();
    ZipOutputStream zipOut = new ZipOutputStream(bos);
    ZipEntry entry = new ZipEntry(fileName);
    entry.setSize(input.length);
    zipOut.putNextEntry(entry);
    zipOut.write(input, 0, input.length);
    zipOut.closeEntry();
    zipOut.close();

    //Turn right around and unzip what we just zipped
    ZipInputStream zipIn = new ZipInputStream(new ByteArrayInputStream(bos.toByteArray()));

    while((entry = zipIn.getNextEntry()) != null) {
        assert entry.getSize() >= 0;
    }

    return bos.toByteArray();
}

When I execute this code, the assertion at the bottom fails because entry.size is -1. I don't understand why the extracted entity doesn't match the entity that was zipped.

2

There are 2 answers

2
Loris Securo On BEST ANSWER

Why is the size -1?

Calling getNextEntry in a ZipInputStream just position the read cursor at start of the entry to read.

The size (along with other metadata) is stored at the end of the actual data, therefore is not readily available when the cursor is positioned at the start.

These information becomes available only after you read the whole entry data or just go to the next entry.

For example, going to the next entry:

// position at the start of the first entry
entry = zipIn.getNextEntry();
ZipEntry firstEntry = entry;    
// size is not yet available
System.out.println("before " + firstEntry.getSize()); // prints -1

// position at the start of the second entry
entry = zipIn.getNextEntry();
// size is now available
System.out.println("after " + firstEntry.getSize()); // prints the size

or reading the whole entry data:

// position at the start of the first entry
entry = zipIn.getNextEntry();
// size is not yet available
System.out.println("before " + entry.getSize()); // prints -1

// read the whole entry data
while(zipIn.read() != -1);

// size is now available
System.out.println("after " + entry.getSize()); // prints the size

Your misunderstanding is quite common and there are a number of bug reports regarding this problem (which are closed as "Not an Issue"), like JDK-4079029, JDK-4113731, JDK-6491622.

As also mentioned in the bug reports, you could use ZipFile instead of ZipInputStream which would allow to reach the size information prior to access the entry data; but to create a ZipFile you need a File (see the constructors) instead of a byte array.

For example:

File file = new File( "test.zip" );
ZipFile zipFile = new ZipFile(file);

Enumeration enumeration = zipFile.entries();
while (enumeration.hasMoreElements()) {
    ZipEntry zipEntry = (ZipEntry) enumeration.nextElement();
    System.out.println(zipEntry.getSize()); // prints the size
}

How to get the data from the input stream?

If you want to check if the unzipped data is equal to the original input data, you could read from the input stream like so:

byte[] output = new byte[input.length];
entry = zipIn.getNextEntry();
zipIn.read(output);

System.out.println("Are they equal? " + Arrays.equals(input, output));

// and if we want the size
zipIn.getNextEntry(); // or zipIn.read();
System.out.println("and the size is " + entry.getSize());

Now output should have the same content as input.

5
Matthieu On

How to zip byte[] and unzip it back?

I routinely use the following methods to deflate/inflate (zip/unzip) small byte[] (i.e. when it fits in memory). It is based on the example given in the Deflater javadoc and uses Deflater class to compress data and Inflater class to uncompress it back:

public static byte[] compress(byte[] source, int level) {
    Deflater compresser = new Deflater(level);
    compresser.setInput(source);
    compresser.finish();
    byte[] buf = new byte[1024];
    ByteArrayOutputStream bos = new ByteArrayOutputStream(1024);
    int n;
    while ((n = compresser.deflate(buf)) > 0)
        bos.write(buf, 0, n);
    compresser.end();
    return bos.toByteArray(); // You could as well return "bos" directly
}

public static byte[] uncompress(byte[] source) {
    Inflater decompresser = new Inflater();
    decompresser.setInput(source);
    byte[] buf = new byte[1024];
    ByteArrayOutputStream bos = new ByteArrayOutputStream(1024);
    try {
        int n;
        while ((n = decompresser.inflate(buf)) > 0)
            bos.write(buf, 0, n);
        return bos.toByteArray();
    } catch (DataFormatException e) {
        return null;
    } finally {
        decompresser.end();
    }
}

There is no need for a ByteArrayInputStream, but you could use an InflaterInputStream wrapping it, if you really want to (but using the Inflater directly is easier).