C# Weird inline behavior for generic methods - possible bug

213 views Asked by At

For some strange reason, this generic method won't get inlined inside another method, unless the other method contains a loop. What could explain this odd behavior? For non generic methods, the inlining is happening in both cases, with and without loops.

Code:

using System;
using System.Runtime.CompilerServices;
using SharpLab.Runtime;

[JitGeneric(typeof(int))]
public static class GenericOps<T> where T : unmanaged
{
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static bool Less(T left, T right)
    {
        if (typeof(T) == typeof(byte)) return (byte)(object)left < (byte)(object)right;
        if (typeof(T) == typeof(sbyte)) return (sbyte)(object)left < (sbyte)(object)right;
        if (typeof(T) == typeof(ushort)) return (ushort)(object)left < (ushort)(object)right;
        if (typeof(T) == typeof(short)) return (short)(object)left < (short)(object)right;
        if (typeof(T) == typeof(uint)) return (uint)(object)left < (uint)(object)right;
        if (typeof(T) == typeof(int)) return (int)(object)left < (int)(object)right;
        if (typeof(T) == typeof(ulong)) return (ulong)(object)left < (ulong)(object)right;
        if (typeof(T) == typeof(long)) return (long)(object)left < (long)(object)right;
        if (typeof(T) == typeof(float)) return (float)(object)left < (float)(object)right;
        if (typeof(T) == typeof(double)) return (double)(object)left < (double)(object)right;
        return default;
    }
}

[JitGeneric(typeof(int))]
public static class C<T> where T : unmanaged
{      
    public static bool M1(T a, T b)
    {
        return GenericOps<T>.Less(a, b);      
    }
        
    public static bool M2(T a, T b)
    {
        for(int i = 0; i<0; i++) {}
            
        return GenericOps<T>.Less(a, b);    
    }        
}

JIT: (decompiled using SharpLab)

// All the type checks are omitted since the type is known during compile time
// This generated JIT equals to a direct int < int JIT.
GenericOps`1[[System.Int32, System.Private.CoreLib]].Less(Int32, Int32)
    L0000: cmp ecx, edx
    L0002: setl al
    L0005: movzx eax, al
    L0008: ret

// No Inlining
C`1[[System.Int32, System.Private.CoreLib]].M1(Int32, Int32)
    L0000: mov rax, GenericOps`1[[System.Int32, System.Private.CoreLib]].Less(Int32, Int32)
    L000a: jmp rax

// Direct Inline
C`1[[System.Int32, System.Private.CoreLib]].M2(Int32, Int32) // Direct Inline
    L0000: cmp ecx, edx
    L0002: setl al
    L0005: movzx eax, al
    L0008: ret

Takeaways:

  • The weird thing is, that it won't inline the method call in C.M1() even if it makes the generated JIT size smaller - I've tested with different methods as well.
  • This odd behavior happens only when the method is generic, it always inlines a direct non-generic implementation.
  • It something that has to do with the type switches in the generic method. If the generic method does not contain these type switches, then it will get inlined in both cases (M1 and M2), even without AggressiveInlining attribute, as long as the method is short.
  • The loop kicks in some heuristic which causes the inlining to happen.

The questions that rise from this example are:

  • Is this behavior intentional or is it a bug?
  • Is there a way to guarantee the inlining of the Less() method, without using weird loops in the caller method?
  • Does this behavior also happening in the System.Numerics.Vector<T> class, since it uses the same generic type switches that get optimized away?
1

There are 1 answers

0
l33t On BEST ANSWER

Given that this was fixed in .NET 5, I would call it a bug. Verified in SharpLab with the following .NET versions:

  • x64 (.NET 5) - inlined
  • Core CLR v5.0.321.7212 on x86 - inlined
  • Desktop CLR v4.8.4261.00 on x86/amd64 - NOT inlined
  • Core CLR v4.700.20.20201 on x86 - NOT inlined
  • Core CLR v4.700.19.46205 on x86 - NOT inlined

So, to answer your questions:

  1. Yes, it is likely a bug.
  2. You cannot guarantee inlining. Especially not for <T> types. The rules/heuristics are probably quite complex.
  3. Clues to the answer can be seen here. Microsoft is well aware of the JIT, so it would be somewhat surprising if a high-performance class like Vector<T> would suffer from inlining problems.