C# MemoryStream is larger than FileStream

2.9k views Asked by At

I've a method to compress a byte-array. I used a memorystream and a filestream for testing. the result from the memorystream is larger, even if its the same method, can anyone explain why?

public byte[] DeflateCompress(byte[] data2Compress)
{
    using (FileStream _fileToCompress = File.Create("_deflatecompressed.bin"))
    {
        using (DeflateStream _compressionStream = new DeflateStream(_fileToCompress, CompressionMode.Compress))
        {
            _compressionStream.Write(data2Compress, 0, data2Compress.Length);
            _compressionStream.Close();
        }
    }

    return File.ReadAllBytes("_deflatecompressed.bin");
}

public byte[] DeflateCompress(byte[] data2Compress)
{
    using (MemoryStream _memStreamCompress = new MemoryStream())
    {
        using (DeflateStream _defalteStreamCompress = new DeflateStream(_memStreamCompress, CompressionMode.Compress))
        {
            _defalteStreamCompress.Write(data2Compress, 0, data2Compress.Length);
            _defalteStreamCompress.Close();
        }

        return _memStreamCompress.GetBuffer();
    }
}

If I write the output byte-array to a file, then the one created with memorystream is larger.

2

There are 2 answers

1
Lucas Trzesniewski On BEST ANSWER

MemoryStream.GetBuffer() will return the full internal buffer, which can be larger than the data. It's resized in chunks as needed. When you exceed the buffer capacity, the internal buffer size is doubled.

If you need to convert the MemoryStream to a byte array containing only the data, use MemoryStream.ToArray(). It will create a new array of the adequate size, and copy the relevant buffer contents into it.

As MSDN puts it:

Note that the buffer contains allocated bytes which might be unused. For example, if the string "test" is written into the MemoryStream object, the length of the buffer returned from GetBuffer is 256, not 4, with 252 bytes unused. To obtain only the data in the buffer, use the ToArray method; however, ToArray creates a copy of the data in memory.

The GetBuffer function is useful in the case when you want to read a chunk from the buffer, and you don't care if the size doesn't exactly match. ToArray is slower as it has to copy the whole buffer contents on each call, while GetBuffer will simply return a reference to the buffer.

For instance, GetBuffer can be useful if you use methods such as Stream.Write:

public abstract void Write(
    byte[] buffer,
    int offset,
    int count
)

There are many places in the framework with such overloads that take a buffer but only process a chunk of it.

4
TomTom On

The MemoryStream is using a byte array internally. Which gets doubled when needed.

So likely it has a TON of not used bytes.