performance of GZIPOutputStream vs BufferedOutputStream

2.2k views Asked by At

my application is logging a shitload of video and i2c sensor data into a disk file - as fast as possible. Currently I am converting everything to bytes and i am writing with a BufferedOutputStream. @Siguza was kind enough to suggest looking into a GZIPOutputStream to accomplish the deed. i was wondering whether you had any thoughts on performance issues pro and con ... i am thinking the processor is way ahead and the disk write is the bottleneck - so i am hoping that compressing on the fly via a GZIPOutputStream before the write might be a good strategy. any thoughts on this greatly welcome.

Added: in response to comments ...

turns out zipping is not that processor expensive ... and the way i had asked the original question was not great, as erwin rightly pointed out. the question about zipping performance is not between a BufferedOutputStream and a GZIPOutputStream ... both zipped and unzipped streams need to be wrapped into a BufferedOutputStream, but how much of a cost is added if the original FileOutputStream is wrapped in a GZIPOutputStream first before it is wrapped in a BufferedOutputStream. here is the answer. I am using code

byte[] bs = RHUtilities.toByteArray((int)1);
boolean zipped = false;

FileOutputStream fos = new FileOutputStream(datFile);
BufferedOutputStream bos = null;
if (zipped) {
    GZIPOutputStream gz = new GZIPOutputStream(fos);
    bos = new BufferedOutputStream(gz);
} else 
    bos = new BufferedOutputStream(fos);
long startT = System.currentTimeMillis();
for (int i=0; i<1000000; i++)
    bos.write(bs);
bos.flush();
System.out.println(System.currentTimeMillis()-startT);
bos.close();

my 2012 macpro laptop does a write of 1M ints with

zipped=true in 38ms - filesize 4MB
zipped=false in 21ms - fileSize 4KB

and, yes, i like the compression :-)

read perfomance is almost identical 83 vs 86ms between

FileInputStream fin = new FileInputStream(datFile);

and

GZIPInputStream gin = new GZIPInputStream(new FileInputStream(datFile));

all good ...

1

There are 1 answers

1
Stephen C On BEST ANSWER

There are a whole lot of issues raised by this question:

i am thinking the processor is way ahead and the disk write is the bottleneck

"I am thinking" is not a sound basis for optimizing performance. You need to do some measurements to find out where the bottleneck actually is. (If your "thinking" is wrong, then changing to GZipOutputStream is liable to make things worse.)

Alternatively, just try it, and measure whether it improves performance or not.

From a theoretical perspective, if there was a significant mismatch between processor and disc speed then compression could help. And one possible upside is that compression could also save disk space.

But the downsides are:

  • compression is relatively expensive (and so is decompression), so you may end up using more (elapsed) time than you are gaining by reducing I/O
  • compression is ineffective on small files,
  • format-agnostic compression is not very effective on raw (uncompressed) audio or video data1
  • if your video data is already compressed, then a second compression will achieve nothing.

Finally, it could be a "lots of small files" problem. If you attempt to read and write lots of little files, the bottleneck is likely to not be raw disk speed. Rather, it is likely to be the OS's ability to read and write directories and/or file metadata. If that is where your problem is, then you should be looking at bundling the "lots of little files" into archives; e.g. TAR or ZIP files. There are libraries for doing this in Java.

And another benefit of archives is that they can make compression more effective.


1 - For background, read https://en.wikipedia.org/wiki/Lossless_compression and https://en.wikipedia.org/wiki/List_of_codecs#Lossless_video_compression