While experimenting with the new Span<byte> and Memory<byte> features I found that using Memory<byte> was much slower parsing binary data than I expected compared to other methods of interacting with a byte array.

I set up a benchmark test suite that reads a single integer from an array, using a variety of methods, and found that Memory was the slowest. It was slower than Span, as expected, but surprisingly it was also slower than direct usage of the array, as well as my own homegrown version of what I expect Memory to be similar to, internally.

// Suite of tests comparing various ways to read an offset int from an array
public class BinaryTests
{
    static byte[] arr = new byte[] { 0, 1, 2, 3, 4 };
    static Memory<byte> mem = arr.AsMemory();
    static HomegrownMemory memTest = new HomegrownMemory(arr);

    [Benchmark]
    public int StraightArrayBitConverter()
    {
        return BitConverter.ToInt32(arr, 1);
    }

    [Benchmark]
    public int MemorySlice()
    {
        return BinaryPrimitives.ReadInt32LittleEndian(mem.Slice(1).Span);
    }

    [Benchmark]
    public int MemorySliceToSize()
    {
        return BinaryPrimitives.ReadInt32LittleEndian(mem.Slice(1, 4).Span);
    }

    [Benchmark]
    public int MemorySpanSlice()
    {
        return BinaryPrimitives.ReadInt32LittleEndian(mem.Span.Slice(1));
    }

    [Benchmark]
    public int MemorySpanSliceToSize()
    {
        return BinaryPrimitives.ReadInt32LittleEndian(mem.Span.Slice(1, 4));
    }

    [Benchmark]
    public int HomegrownMemorySlice()
    {
        return BinaryPrimitives.ReadInt32LittleEndian(memTest.Slice(1).Span);
    }

    [Benchmark]
    public int HomegrownMemorySliceToSize()
    {
        return BinaryPrimitives.ReadInt32LittleEndian(memTest.Slice(1, 4).Span);
    }

    [Benchmark]
    public int HomegrownMemorySpanSlice()
    {
        return BinaryPrimitives.ReadInt32LittleEndian(memTest.Span.Slice(1));
    }

    [Benchmark]
    public int HomegrownMemorySpanSliceToSize()
    {
        return BinaryPrimitives.ReadInt32LittleEndian(memTest.Span.Slice(1, 4));
    }

    [Benchmark]
    public int SpanSlice()
    {
        return BinaryPrimitives.ReadInt32LittleEndian(arr.AsSpan().Slice(1));
    }

    [Benchmark]
    public int SpanSliceToSize()
    {
        return BinaryPrimitives.ReadInt32LittleEndian(arr.AsSpan().Slice(1, 4));
    }
}

// Personal "implementation" of Memory<T>, for testing
struct HomegrownMemory
{
    byte[] _arr;
    int _startPos;
    int _length;

    public HomegrownMemory(byte[] b)
    {
        this._arr = b;
        this._startPos = 0;
        this._length = b.Length;
    }

    public Span<byte> Span => _arr.AsSpan(start: _startPos, length: _length);

    public HomegrownMemory Slice(int start)
    {
        return new HomegrownMemory()
        {
            _arr = _arr,
            _startPos = _startPos + start,
            _length = _length - start
        };
    }

    public HomegrownMemory Slice(int start, int length)
    {
        return new HomegrownMemory()
        {
            _arr = _arr,
            _startPos = _startPos + start,
            _length = length
        };
    }
}

Here are the BenchmarkNet results of the above code:

BenchmarkDotNet=v0.11.5, OS=Windows 10.0.17134.765 (1803/April2018Update/Redstone4)
Intel Core i7-4790K CPU 4.00GHz (Haswell), 1 CPU, 8 logical and 4 physical cores
Frequency=3984652 Hz, Resolution=250.9629 ns, Timer=TSC
.NET Core SDK=2.1.700-preview-009618
  [Host]     : .NET Core 2.1.11 (CoreCLR 4.6.27617.04, CoreFX 4.6.27617.02), 64bit RyuJIT
  DefaultJob : .NET Core 2.1.11 (CoreCLR 4.6.27617.04, CoreFX 4.6.27617.02), 64bit RyuJIT
|                         Method |      Mean |     Error |    StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------------------- |----------:|----------:|----------:|------:|------:|------:|----------:|
|      StraightArrayBitConverter | 1.0832 ns | 0.0323 ns | 0.0270 ns |     - |     - |     - |         - |
|                    MemorySlice | 5.8882 ns | 0.0654 ns | 0.0612 ns |     - |     - |     - |         - |
|              MemorySliceToSize | 6.0191 ns | 0.0983 ns | 0.0919 ns |     - |     - |     - |         - |
|                MemorySpanSlice | 5.0230 ns | 0.0626 ns | 0.0555 ns |     - |     - |     - |         - |
|          MemorySpanSliceToSize | 5.0189 ns | 0.0335 ns | 0.0313 ns |     - |     - |     - |         - |
|           HomegrownMemorySlice | 3.9217 ns | 0.0419 ns | 0.0392 ns |     - |     - |     - |         - |
|     HomegrownMemorySliceToSize | 1.5233 ns | 0.0199 ns | 0.0186 ns |     - |     - |     - |         - |
|       HomegrownMemorySpanSlice | 0.8301 ns | 0.0243 ns | 0.0227 ns |     - |     - |     - |         - |
| HomegrownMemorySpanSliceToSize | 0.8303 ns | 0.0223 ns | 0.0208 ns |     - |     - |     - |         - |
|                      SpanSlice | 0.6891 ns | 0.0241 ns | 0.0214 ns |     - |     - |     - |         - |
|                SpanSliceToSize | 0.6804 ns | 0.0174 ns | 0.0163 ns |     - |     - |     - |         - |

All of these timings make a lot of sense to me, except the Memory<T> timings, which are all slower than I would've expected.

It was my understanding that Memory<T> was simply an implementation of Span<T> that could live on the heap, eg.. not a ref struct.

I would have expected it to perform slower than Span, but at least par if not a little faster than the Straight Array implementation. The results I achieved with my homegrown version are the results I expected out ofMemory<T>

Is there something fundamental I'm missing here about the use case for Memory<T>, or what it's trying to accomplish? Something seems off about my understanding after seeing these results.

EDIT: After Cowen's comment, I located the Memory source code and took a look. It does seem to be doing a lot when retrieving the span, specifically checking and casting its generic object field to find out what type it is in order to cast properly.

I'm surprised they didn't give use different Memory options and/or provide a Memory factory to construct classes with more strongly typed internal data fields. Instead they opted to have a field that is an object that has to constantly be checked/casted to get at the Span, which is something I feel like should/would be happening constantly during use.

I'm still curious as to why they designed Memory this way and more importantly, what the use cases are given it's designed this way. I feel like a lot of people who are utilizing Span/Memory are after the speed benefits and the generic object field seems to encourage not using it.

0 Answers