Which code in LLVM IR runs before "main()"?

3k views Asked by At

Does anyone know the general rule for exactly which LLVM IR code will be executed before main?

When using Clang++ 3.6, it seems that global class variables have their constructors called via a function in the ".text.startup" section of the object file. For example:

define internal void @__cxx_global_var_init() section ".text.startup" {
  call void @_ZN7MyClassC2Ev(%class.MyClass* @M)
  ret void
}

From this example, I'd guess that I should be looking for exactly those IR function definitions that specify section ".text.startup".

I have two reasons to suspect my theory is correct:

  • I don't see anything else in my LLVM IR file (.ll) suggesting that the global object constructors should be run first, if we assume that LLVM isn't sniffing for C++ -specific function names like "__cxx_global_var_init". So section ".text.startup" is the only obvious means of saying that code should run before main(). But even if that's correct, we've identified a sufficient condition for causing a function to run before main(), but haven't shown that it's the only way in LLVM IR to cause a function to run before main().

  • The Gnu linker, in some cases, will use the first instruction in the .text section to be the program entry point. This article on Raspberry Pi programming describes causing the .text.startup content to be the first body of code appearing in the program's .text section, as a means of causing the .text.startup code to run first.

Unfortunately I'm not finding much else to support my theory:

  • When I grep the LLVM 3.6 source code for the string ".startup", I only find it in the CLang-specific parts of the LLVM code. For my theory to be correct, I would expect to have found that string in other parts of the LLVM code as well; in particular, parts outside of the C++ front-end.

  • This article on data initialization in C++ seems to hint at ".text.startup" having a special role, but it doesn't come right out and say that the Linux program loader actually looks for a section of that name. Even if it did, I'd be surprised to find a potentially Linux-specific section name carrying special meaning in platform-neutral LLVM IR.

  • The Linux 3.13.0 source code doesn't seem to contain the string ".startup", suggesting to me that the program loader isn't sniffing for a section with the name ".text.startup".

1

There are 1 answers

0
Anton Korobeynikov On BEST ANSWER

The answer is pretty easy - LLVM is not executing anything behind the scenes. It's a job of the C runtime (CRT) to perform all necessary preparations before running main(). This includes (but not limited to) to static ctors and similar things. The runtime is usually informed about these objects via addresses of constructores being emitted in the special sections (e.g. .init_array or .ctors). See e.g. http://wiki.osdev.org/Calling_Global_Constructors for more information.