Java- concatenating compressed data with Deflater or GZIPOutputStream

558 views Asked by At

We have a bunch of threads that take a block of data, compress this data and then eventually concatenate them into one large byte array. If anyone can expand on this idea or recommend another method, that'd be awesome. I've currently got two methods that I'm trying out, but neither are working the way they should:


The first: I have each thread's run() function take the input data and just use GZIPOutputStream to compress it and write it to the buffer.

The problem with this approach here is that, because each thread has one block of data which is part of a longer complete data when I call GZIPOutputStream, it treats that little block as a complete piece of data to zip. That means it sticks on the header and trailer (I also use a custom dictionary so I've got no idea how many bits the header is now nor how to find out).

I think you could manually cut off the header and trailer and you would just be left with compressed data (and leave the header of the first block and the trailer of the last block). The other thing I'm not sure about with this method is whether I can even do that. If I leave the header on the first block of data, will it still decompress correctly. Doesn't that header contain information for ONLY the first block of the data and not the other concatenated blocks?


The second method is to use the Deflater class. In that case, I can simply set the input, set the dictionary, and then call deflate().

The problem is, that's not gzip format. That's just "raw" compressed data. I have no idea how to make it so that gzip can recognize the final output.

1

There are 1 answers

0
user207421 On

You need a method that writes to a single GZIPOutputStream that is called by the other threads, with suitable co-ordination between them so the data doesn't get mixed up. Or else have the threads write to temporary files, and assemble and zip it all in a second phase.