As long as there's .net installed - I understand that the IL code just lets .net take care of the different types of CPU's. But when we consider native code - as far as I understand (and please correct me if I'm wrong) - the code has to be compiled for every processor type independently, and a "CPU dispatcher" at the beginning of the executable chooses which code will be executed - dependent on the exact type of CPU. So there will be some "simplest" code that will run on any Intel/AMD CPU and there will then be some more optimized codes.
Hence my question: how many versions of the code will be created by .net-native, and how many by Ngen? Will Ngen have only one version, and not even the "simplest" one? Or will it also have that - so that if copied onto a different machine - it'll still work? Will .net-native have many versions - making the executable several times the size it could be? Or is most of the code the same, and only the optimizable parts written for whatever they could be optimized for?
EDIT
When I mention "type of processor" - I don't mean: X86 or ARM. I mean: within x86 (or x64). Such as processors supporting MMX, 3DNow!, SSE ...
Both NGEN and .NET Native generate a single set of code per architecture. They don't change code generation behavior based on the model of the user's processor.
In NGEN's case, this might change. A future version of NGEN could detect the processor model, since it's running on the end user's machine, and generate the best code for that model.
.NET Native works like C++. All code is generated in advance, not on the user's machine, and therefore the code needs to run on every supported processor model. .NET Native only runs on Win8.1 and above. So, the minimum system requirements for Win8.1 define the processor features available to .NET Native. For x86, Win8+ requires SSE2 to be available.
One interesting sidenote about MSVCRT:
.NET uses the Microsoft Visual C++ runtime for a number of utility functions. Some of the functions inside the C++ runtime do check for processor features and execute different code paths for different processors. (In particular, I know x86 memcpy does this.) These checks happen at runtime, and the code remains portable across different processor models.