For C# binary, where's the anonymous class's type information stored?

682 views Asked by At

I did an experiment in C#, first I created a class library called "ClassLibrary1", with code below:

public class ClassLibrary1
{
    public static void f()
    {
        var m = new { m_s = "abc", m_l = 2L };
        Console.WriteLine(m.GetType());
    }
}

Note, I removed namespace information generated by IDE. Then I created console application with code below:(also removed namespace) while referring to ClassLibrary1:

class Program
{
    static void Main()
    {
        var m = new {m_s = "xyz", m_l = 5L};
        Console.WriteLine(m.GetType());
        ClassLibrary1.f();
    }
}

I run the program, it prints:

<>f__AnonymousType0`2[System.String,System.Int64]
<>f__AnonymousType0`2[System.String,System.Int64]
Press any key to continue . . .

The output indicates that the 2 anonymous classes defined in class library and console application are having identical class type.

My question is: how does C# binary store its type information for all the classes it contains? If it's stored in a global place, when the exe is built with dll reference, 2 same anonymous type information is there, so

(1) Is name duplication an error that should be avoid?
(2) If not an error like I tested, how could C# binary store duplicate type information?
(3) And in runtime, what's the rule to look up type information to create real objects? 

Seems a bit confusing in my example. Thanks.

3

There are 3 answers

0
Jon Hanna On

(Note, I'm using the reversed prime ‵ character here where the grave accent character is in the code, since that has a special meaning in markdown, and they look similar. This may not work on all browsers).

My question is: how does C# binary store its type information for all the classes it contains?

The same way it stores any other class. There is no such thing as an anonymous type in .NET, it's something that the C# (and other .NET languages) provide by compiling to what at the CIL level is a perfectly normal class with a perfectly normal name; because at the CIL level there's nothing special about the name <>f__AnonymousType‵2[System.String,System.Int64] though its being an illegal name in C#, VB.NET and many other languages has the advantage of avoiding direct use that would be inappropriate.

If it's stored in a global place, when the exe is built with dll reference, 2 same anonymous type information is there.

Try changing your Console.WriteLine(m.GetType()) to Console.WriteLine(m.GetType().AssemblyQualifiedName) and you'll see that they aren't the same type.

Is name duplication an error that should be avoid?

No, because CIL produced uses the AssemblyQualifiedName if it deals with classes from other assemblies.

If not an error like I tested, how could C# binary store duplicate type information?

The error was not in what you looked at, but in how you looked at it. There is no duplication.

And in runtime, what's the rule to look up type information to create real objects?

The type gets compiled directly into the calls, with the lookup happening at that point. Consider your f():

public static void f()
{
  var m = new { m_s = "abc", m_l = 2L };
  Console.WriteLine(m.GetType());
}

That is compiled to two things. The first is the anonymous type here goes into a list of definitions of anonymous types in the assembly, and they are all compiled into the equivalent of:

internal class SomeImpossibleName<M_SType, M_LType>
{
  private readonly M_SType _m_s;
  private readonly M_LType _m_l;
  public SomeImpossibleName(M_SType s, M_LType l)
  {
    _m_s = s;
    _m_l = l;
  }
  public M_SType m_s
  {
    get { return _m_s; }
  }
  public M_LType m_l
  {
    get { return _m_l; }
  }
  public override bool Equals(object value)
  {
    var compareWith = value as SomeImpossibleName<M_SType, M_LType>;
    if(compareWith == null)
      return false;
    if(!EqualityComparer<M_SType>.Default.Equals(_m_s, compareWith._m_s))
      return false;
    return EqualityComparer<M_LType>.Default.Equals(_m_l, compareWith._m_l);
  }
  public override int GetHashCode()
  {
    unchecked
    {
      return (-143687205 * -1521134295 + EqualityComparer<M_SType>.Default.GetHashCode(_m_s))
      * 1521134295 + EqualityComparer<M_LType>.Default.GetHashCode(_m_l);
    }
  }
  public override string ToString()
  {
    return new StringBuilder().Append("{ m_s = ")
      .Append((object)_m_s)
      .Append(", m_l = ")
      .Append((object)_m_l)
      .Append(" }")
      .ToString();
  }
}

Some things to note here:

  1. This uses a generic type, to save on the compiled size if you had a bunch of different classes with an m_s followed by an m_l of different types.
  2. This allows for a simple but reasonable comparison between objects of the same type, without which GroupBy and Distinct would not work.
  3. I called this SomeImpossibleName<M_SType, M_LType> the real name would be <>f__AnonymousType0<<m_s>j__TPar, <m_l>j__TPar>>. That is, not only is the main part of the name impossible in C#, but so are the names of the type parameters.
  4. If you have two methods that each do new Something{ m_s = "abc", m_l = 2L } they will both use this type.
  5. The constructor is optimised. While in C# generally calling var x = new Something{ m_s = "abc", m_l = 2L } is the same as calling var x = new Something; x.m_s = "abc"; x.m_l = 2L; the code created for doing so with an anonymous type is actually the equivalent to var x = new Something("abc", 2L). This both gives a performance benefit but more importantly allows anonymous types to be immutable even though the form of constructor used only works with named types if they are mutable.

Also the following CIL for the method:

.method public hidebysig static void f () cil managed 
{
  .maxstack 2
  .locals init
  (
  [0] class '<>f__AnonymousType0`2'<string, int64>
  )

  // Push the string "abc" onto the stack.
  ldstr "abc"

  // Push the number 2 onto the stack as an int32
  ldc.i4.2

  // Pop the top value from the stack, convert it to an int64 and push that onto the stack.
  conv.i8

  // Allocate a new object can call the <>f__AnonymousType0`2'<string, int64> constructor.
  // (This call will make use of the string and long because that's how the constructor is defined
  newobj instance void class '<>f__AnonymousType0`2'<string, int64>::.ctor(!0, !1)

  // Store the object in the locals array, and then take it out again.
  // (Yes, this is a waste of time, but it often isn't and so the compiler sometimes adds in these
  // stores).
  stloc.0
  ldloc.0

  // Call GetType() which will pop the current value off the stack (the object) and push on
  // The result of GetType()

  callvirt instance class [mscorlib]System.Type [mscorlib]System.Object::GetType()

  // Call WriteLine, which is a static method, so it doesn't need a System.Console item
  // on the stack, but which takes an object parameter from the stack.
  call void [mscorlib]System.Console::WriteLine(object)

  // Return
  ret
}

Now, some things to note here. Notice how all the calls to methods defined in the mscorlib assembly. All calls across assemblies use this. So too do all uses of classes across assemblies. As such if two assemblies both have a <>f__AnonymousType0‵2 class, they will not cause a collision: Internal calls would use <>f__AnonymousType0‵2 and calls to the other assembly would use [Some.Assembly.Name]<>f__AnonymousType0‵2 so there is no collision.

The other thing to note is the newobj instance void class '<>f__AnonymousType0‵2'<string, int64>::.ctor(!0, !1) which is the answer to your question, "And in runtime, what's the rule to look up type information to create real objects?". It isn't looked up at runtime at all, but the call to the relevant constructor is determined at compile time.

Conversely, there's nothing to stop you from having non-anonymous types with the exact same name in different assemblies. Add an explicit reference to mscorlib to a console application project and change its alias from the default global to global, mscrolib and then try this:

namespace System.Collections.Generic
{
  extern alias mscorlib;
  public class List<T>
  {
    public string Count
    {
      get{ return "This is a very strange “Count”, isn’t it?"; }
    }
  }
  class Program
  {
    public static void Main(string[] args)
    {
      var myList = new System.Collections.Generic.List<int>();
      var theirList = new mscorlib::System.Collections.Generic.List<int>();
      Console.WriteLine(myList.Count);
      Console.WriteLine(theirList.Count);
      Console.Read();
    }
  }
}

While there's a collision on the name System.Collections.Generic.List, the use of extern alias allows us to specify which assembly the compiler should look in for it, so we can use both versions side by side. Of course we wouldn't want to do this and its a lot of hassle and confusion, but compilers don't get hassled or confused in the same way.

2
Denis  Yarkovoy On

It is possible to have duplicate names in the .NET assembly, because metadata items (classes, fields, properties etc) are referenced internally by numeric metadata token, not by the name

Although the use of duplicate names is restricted in ECMA-335 (except several special cases), this possibility is exploited by a number of obfuscators, and, probably, by the compilers in cases when the name of the metadata item (class in your case) is not directly exposed to the user code

EDIT: CodeCaster is right with his answer, the names reside in different assemblies in your case, hence the duplicate names. Though I believe my point with having duplicate names in the same assembly is valid, but may not be applicable to this particular question.

0
CodeCaster On

I removed namespace information

Irrelevant. Anonymous types for an assembly are generated in the same namespace, namely an empty one.

Furthermore, see C# specs 7.6.10.6 Anonymous object creation expressions:

Within the same program, two anonymous object initializers that specify a sequence of properties of the same names and compile-time types in the same order will produce instances of the same anonymous type.

Confusingly, "program" here means "assembly". So:

how does C# binary store its type information for all the classes it contains? If it's stored in a global place, when the exe is built with dll reference, 2 same anonymous type information is there

That's right, but the types are unique per assembly. They can have the same type name, because they're in a different assembly. You can see that by printing m.GetType().AssemblyQualifiedName, which will include the assembly name.