Is there a problem with IO.Compression?

1.2k views Asked by At

I've just started compressing file in VB.Net, using the following code. Since I'm targeting Fx 2.0, I can't use the Stream.CopyTo method.

My code, however, gives extremely poor results compared to the gzip Normal compression profile in 7-zip. For example, my code compressed a 630MB outlook archive to 740MB, and 7-zip makes it 490MB.

Here is the code. Is there a blatant mistake (or many?)

Using Input As New IO.FileStream(SourceFile, IO.FileMode.Open, IO.FileAccess.Read, IO.FileShare.Read)
    Using outFile As IO.FileStream = IO.File.Create(DestFile)
        Using Compress As IO.Compression.GZipStream = New IO.Compression.GZipStream(outFile, IO.Compression.CompressionMode.Compress)
            'TODO: Figure out the right buffer size.'
            Dim Buffer(524228) As Byte
            Dim ReadBytes As Integer = 0

            While True
                ReadBytes = Input.Read(Buffer, 0, Buffer.Length)
                If ReadBytes <= 0 Then Exit While
                Compress.Write(Buffer, 0, ReadBytes)
            End While
        End Using
    End Using
End Using

I've tried with multiple buffer sizes, but I get similar compression times, and exactly the same compression ratio.

3

There are 3 answers

3
Jeffrey Hantin On BEST ANSWER

EDIT, or actually rewrite: It looks like the BCL coders decided to phone it in.

The implementation in System.dll version 2.0 uses statically defined, hardcoded Huffman trees optimized for plain ASCII text, rather than adaptively generating the Huffman trees as other implementations do. It also doesn't support stored-block optimization (which is how standard GZip/Deflate avoid runaway expansion). As a result, running any sort of file through their implementation other than plain text will result in a much larger file than the input, and Microsoft claims this is by design!

Save yourself some pain, grab a third party implementation.

2
Jonathan Allen On

IO.Compression wasn't really made for us. It was created the support the XPS or XML Paper Specificatin. Currently you have to use a third party library if you want decent file compression.

0
DarrenMB On

Some additional information that may be useful. I was compressing some static files (binary) to include in a project release and had the same issue where the file size increased with IO.Compression.GZipStream.

I decided to use Ionic.Zip instead where the best compression could be used.

One thing I noticed immediately is that even though Ionic.Zip reduced my files to 25% of there original size the Compressing Action was about 3-4 times slower (totally expected) but the unzip process was also 3 times slower which made the decompress take 1.6 seconds compared to 0.5 seconds.

Since the GZipStream is a standard, even though the built in IO.Compression.GZipStream in .NET was far less space efficient compressing, it was far faster decompressing.

So I use both Ionic.Zip Librarys "ZLib.GZipStream" to Compress the files and "IO.Compression.GZipStream" to Decompress the files much faster in production.