Strange IL code emitted by some compiler

270 views Asked by At

I've been looking at some old, (Reflector) decompiled source code that I dug up. The DLL was originally compiled from Visual Basic .NET source, using .NET 2.0 - apart from that I have no information about the compiler anymore.

At some point something strange happened. There was a branch in the code that wasn't followed, even though the condition should have holded. To be exact, this was the branch:

[...]
if (item.Found > 0)
{
    [...]

Now, the interesting part was that if item.Found was -1, the scope of the if statement was entered. The type of item.Found was int.

To figure out what was going on, I went looking in the IL code and found this:

ldloc.3 
ldfld int32 Info::Found
ldc.i4.0 
cgt.un
stloc.s flag3
ldloc.s flag3
brfalse.s L_0024

Obviously Reflector was wrong here. The correct decompiled code should have been:

if ((uint)item.Found > (uint)0) 
{ ... }

OK so far for context. Now for my question.

First off, I cannot imagine someone actually writing this code; IMO no-one with a sane mind makes the distinction between '-1' and '0' this way - which are the only two values that 'Found' can have.

So, that leaves me with the conclusion that the compiler does something I do not understand.

  • Why on earth / in what context would a compiler generate IL code like this? What's the benefit of this check (instead of ceq or bne_un - which is what I would have expected and is normally generated by C#)?
  • And related: what was the original source code most likely?
3

There are 3 answers

3
Matthew Watson On

As an experiment I compiled this VB code:

Dim test As Boolean
test = True
Dim x As Integer
x = test
If x Then Console.WriteLine("True")

The IL for the release version of this is:

.custom instance void [mscorlib]System.STAThreadAttribute::.ctor()
.entrypoint
.maxstack 2
.locals init (
    [0] bool test,
    [1] int32 x)
L_0000: ldc.i4.1 
L_0001: stloc.0 
L_0002: ldloc.0 
L_0003: ldc.i4.0 
L_0004: cgt.un 
L_0006: neg 
L_0007: stloc.1 
L_0008: ldloc.1 
L_0009: ldc.i4.0 
L_000a: cgt.un 
L_000c: brfalse.s L_0018
L_000e: ldstr "True"
L_0013: call void [mscorlib]System.Console::WriteLine(string)
L_0018: ret 

Note the use of cgt.un

Reflector's interpretation as C# is:

bool test = true;
int x = (int) -(test > false);
if (x > 0x0)
{
    Console.WriteLine("True");
}

And as VB:

Dim test As Boolean = True
Dim x As Integer = CInt(-(test > False))
If (x > &H0) Then
    Console.WriteLine("True")
End If

Therefore I conclude the generated code is related to the conversion of the VB Boolean to a numeric value.

3
Jon Hanna On

Let's first consider that there are as you say two possible values -1 and 0. There's a question of what should be done if 42 ends up in there; whether that is impossible (you are correct in your statement) or just about possible (the value acts like a variant_bool in which -1 is the normal true value, but all non-zero should be treated as true) it's worth considering either way. And it makes sense to treat 42 the same as we treat -1; that is, it make sense to treat all non-zero as the same.

And even if there is absolutely no other possible non-zero value than -1 it still generalises to "test is non-zero" which is a very common case elsewhere, so it still makes sense to consider this a "test is non-zero" case. This is especially so if the compiler doesn't know -1 is the only possible non-zero value (very likely).

Now there is the question of whether to branch directly on the value (with brfalse, brtrue etc.) or to do a boolean operation and then branch on the result. Generally both the C# and VB.NET compilers will produce a boolean value and then branch on that in a debug builds:

Simple Code:

public void TestBool(bool x)
{
  if(x)
    throw new ArgumentOutOfRangeException();
}

Debug CIL:

  nop
  ldarg.1
  ldc.i4.0
  ceq
  stloc.0
  ldloc.0
  brtrue.s NoError
  newobj instance void [mscorlib]System.ArgumentOutOfRangeException::.ctor()
  throw
NoError:
  ret

Release CIL:

  ldarg.1
  brfalse.s NoError
  newobj instance void [mscorlib]System.ArgumentOutOfRangeException::.ctor()
  throw
NoError:
  ret

The extra steps of essentially doing x == true before doing the branching aids debugging. Similar effects are sometimes seen in release code, though less often.

So, for this reason we have a comparison being done before the branch in your code, rather than just a branch.

Now there is another question, of whether we should test that the value is zero or test that the value is not zero; either is equivalent much as:

if(x)
  DoSomething();

And

if(!x)
{
}
else
  DoSomething();

Are equivalent.

For this reason ceq could have been used, with the branching subsequent being appropriate for the case where item.Found as 0. But it's if anything more sensible to use cne with the branching subsequent being appropriate for the case where item.Found is not 0.

But there's no such CIL instruction as cne, or anything which comparably tests if something is not equal. Generally to do "check not equal" we do a sequence ceq, ldc.i4.0, ceq; check two values are equal and then check that the result of that check is false.

Luckily in the common case that what we are checking something is not equal to is 0 we don't need cne because cgt.un is logically equivalent to a hypothetical cne in this case. This makes cgt.un the obvious choice when we want to test that something isn't zero.

And hence while IYO "no-one with a sane mind makes the distinction between '-1' and '0' this way" it's a very sane way indeed to test for non-zero generally. And indeed, cgt.un appears often as just such a non-zero test.

And related: what was the original source code most likely?

If item.Found Then
  'More stuff
End If

Which is equivalent to the C#

if(item.Found != 0)
{
  //More stuff
}
3
Hans Passant On

Looks quirky but this is related to previous versions of Visual Basic, the generation that ended with VB6. It had a very different Boolean type representation, a VARIANT_BOOL. This still is a factor in VB.NET due to its need to support legacy code.

The value representation for True was different, it was -1. False is 0 like it is in .NET.

While that looks like a very quirky choice as well, any other language uses 1 to represent True, there was a very good reason for it. It makes the distinction between the logical and the mathemetical And and Or operators disappear. Which is pretty nice, one more thing a programmer doesn't have to learn. That this is a learning obstacle is pretty evident from the kind of code most any C# programmer writes, they blindly apply && or || in their if() statements. Even when it is not a good idea to do so, these operators are expensive due to the required short-circuiting branch in the machine code. If the left operand is poorly predicted by the processor's branch prediction then you'll easily lose a bunch of cpu cycles due to the pipeline stall.

Nice but not without problems, And and Or always evaluate both left and right operands. And that has a knack for tripping exceptions, sometimes you really do need short-circuiting. VB.NET added the AndAlso and OrElse operators to fix that problem.

So cgt.un makes sense, that can handle both a .NET Boolean value and a legacy value. It doesn't care if the True value is -1 or 1. And does not care that the variable or expression is actually Boolean, permitted in VB.NET with Option Strict Off.