I've just started compressing file in VB.Net, using the following code. Since I'm targeting Fx 2.0, I can't use the Stream.CopyTo
method.
My code, however, gives extremely poor results compared to the gzip Normal
compression profile in 7-zip. For example, my code compressed a 630MB outlook archive to 740MB, and 7-zip makes it 490MB.
Here is the code. Is there a blatant mistake (or many?)
Using Input As New IO.FileStream(SourceFile, IO.FileMode.Open, IO.FileAccess.Read, IO.FileShare.Read)
Using outFile As IO.FileStream = IO.File.Create(DestFile)
Using Compress As IO.Compression.GZipStream = New IO.Compression.GZipStream(outFile, IO.Compression.CompressionMode.Compress)
'TODO: Figure out the right buffer size.'
Dim Buffer(524228) As Byte
Dim ReadBytes As Integer = 0
While True
ReadBytes = Input.Read(Buffer, 0, Buffer.Length)
If ReadBytes <= 0 Then Exit While
Compress.Write(Buffer, 0, ReadBytes)
End While
End Using
End Using
End Using
I've tried with multiple buffer sizes, but I get similar compression times, and exactly the same compression ratio.
EDIT, or actually rewrite: It looks like the BCL coders decided to phone it in.
The implementation in
System.dll
version 2.0 uses statically defined, hardcoded Huffman trees optimized for plain ASCII text, rather than adaptively generating the Huffman trees as other implementations do. It also doesn't support stored-block optimization (which is how standard GZip/Deflate avoid runaway expansion). As a result, running any sort of file through their implementation other than plain text will result in a much larger file than the input, and Microsoft claims this is by design!Save yourself some pain, grab a third party implementation.