Why do local variables require initialization, but fields do not?

18.5k views Asked by At

If I create a bool within my class, just something like bool check, it defaults to false.

When I create the same bool within my method, bool check(instead of within the class), i get an error "use of unassigned local variable check". Why?

4

There are 4 answers

3
Eric Lippert On BEST ANSWER

Yuval and David's answers are basically correct; summing up:

  • Use of an unassigned local variable is a likely bug, and this can be detected by the compiler at low cost.
  • Use of an unassigned field or array element is less likely a bug, and it is harder to detect the condition in the compiler. Therefore the compiler makes no attempt to detect the use of an uninitialized variable for fields, and instead relies upon the initialization to the default value in order to make the program behavior deterministic.

A commenter to David's answer asks why it is impossible to detect the use of an unassigned field via static analysis; this is the point I want to expand upon in this answer.

First off, for any variable, local or otherwise, it is in practice impossible to determine exactly whether a variable is assigned or unassigned. Consider:

bool x;
if (M()) x = true;
Console.WriteLine(x);

The question "is x assigned?" is equivalent to "does M() return true?" Now, suppose M() returns true if Fermat's Last Theorem is true for all integers less than eleventy gajillion, and false otherwise. In order to determine whether x is definitely assigned, the compiler must essentially produce a proof of Fermat's Last Theorem. The compiler is not that smart.

So what the compiler does instead for locals is implements an algorithm which is fast, and overestimates when a local is not definitely assigned. That is, it has some false positives, where it says "I can't prove that this local is assigned" even though you and I know it is. For example:

bool x;
if (N() * 0 == 0) x = true;
Console.WriteLine(x);

Suppose N() returns an integer. You and I know that N() * 0 will be 0, but the compiler does not know that. (Note: the C# 2.0 compiler did know that, but I removed that optimization, as the specification does not say that the compiler knows that.)

All right, so what do we know so far? It is impractical for locals to get an exact answer, but we can overestimate not-assigned-ness cheaply and get a pretty good result that errs on the side of "make you fix your unclear program". That's good. Why not do the same thing for fields? That is, make a definite assignment checker that overestimates cheaply?

Well, how many ways are there for a local to be initialized? It can be assigned within the text of the method. It can be assigned within a lambda in the text of the method; that lambda might never be invoked, so those assignments are not relevant. Or it can be passed as "out" to anothe method, at which point we can assume it is assigned when the method returns normally. Those are very clear points at which the local is assigned, and they are right there in the same method that the local is declared. Determining definite assignment for locals requires only local analysis. Methods tend to be short -- far less than a million lines of code in a method -- and so analyzing the entire method is quite quick.

Now what about fields? Fields can be initialized in a constructor of course. Or a field initializer. Or the constructor can call an instance method that initializes the fields. Or the constructor can call a virtual method that initailizes the fields. Or the constructor can call a method in another class, which might be in a library, that initializes the fields. Static fields can be initialized in static constructors. Static fields can be initialized by other static constructors.

Essentially the initializer for a field could be anywhere in the entire program, including inside virtual methods that will be declared in libraries that haven't been written yet:

// Library written by BarCorp
public abstract class Bar
{
    // Derived class is responsible for initializing x.
    protected int x;
    protected abstract void InitializeX(); 
    public void M() 
    { 
       InitializeX();
       Console.WriteLine(x); 
    }
}

Is it an error to compile this library? If yes, how is BarCorp supposed to fix the bug? By assigning a default value to x? But that's what the compiler does already.

Suppose this library is legal. If FooCorp writes

public class Foo : Bar
{
    protected override void InitializeX() { } 
}

is that an error? How is the compiler supposed to figure that out? The only way is to do a whole program analysis that tracks the initialization static of every field on every possible path through the program, including paths that involve choice of virtual methods at runtime. This problem can be arbitrarily hard; it can involve simulated execution of millions of control paths. Analyzing local control flows takes microseconds and depends on the size of the method. Analyzing global control flows can take hours because it depends on the complexity of every method in the program and all the libraries.

So why not do a cheaper analysis that doesn't have to analyze the whole program, and just overestimates even more severely? Well, propose an algorithm that works that doesn't make it too hard to write a correct program that actually compiles, and the design team can consider it. I don't know of any such algorithm.

Now, the commenter suggests "require that a constructor initialize all fields". That's not a bad idea. In fact, it is such a not-bad idea that C# already has that feature for structs. A struct constructor is required to definitely-assign all fields by the time the ctor returns normally; the default constructor initializes all the fields to their default values.

What about classes? Well, how do you know that a constructor has initialized a field? The ctor could call a virtual method to initialize the fields, and now we are back in the same position we were in before. Structs don't have derived classes; classes might. Is a library containing an abstract class required to contain a constructor that initializes all its fields? How does the abstract class know what values the fields should be initialized to?

John suggests simply prohibiting calling methods in a ctor before the fields are initialized. So, summing up, our options are:

  • Make common, safe, frequently used programming idioms illegal.
  • Do an expensive whole-program analysis that makes the compilation take hours in order to look for bugs that probably aren't there.
  • Rely upon automatic initialization to default values.

The design team chose the third option.

9
David Arno On

Why do local variables require initialization, but fields do not?

The short answer is that code accessing uninitialised local variables can be detected by the compiler in a reliable way, using static analysis. Whereas this isn't the case of fields. So the compiler enforces the first case, but not the second.

Why do local variables require initialization?

This is no more than a design decision of the C# language, as explained by Eric Lippert. The CLR and the .NET environment do not require it. VB.NET, for example, will compile just fine with uninitialised local variables, and in reality the CLR initialises all uninitialised variables to default values.

The same could occur with C#, but the language designers chose not to. The reason is that initialised variables are a huge source of bugs and so, by mandating initialisation, the compiler helps to cut down on accidental mistakes.

Why don't fields require initialization?

So why doesn't this compulsory explicit initialisation happen with fields within a class? Simply because that explicit initialisation could occur during construction, through a property being called by an object initializer, or even by a method being called long after the event. The compiler cannot use static analysis to determine if every possible path through the code leads to the variable being explicitly initialised before us. Getting it wrong would be annoying, as the developer could be left with valid code that won't compile. So C# doesn't enforce it at all and the CLR is left to automatically initialise fields to a default value if not explicitly set.

What about collection types?

C#'s enforcement of local variable initialisation is limited, which often catches developers out. Consider the following four lines of code:

string str;
var len1 = str.Length;
var array = new string[10];
var len2 = array[0].Length;

The second line of code won't compile, as it's trying to read an uninitialised string variable. The fourth line of code compiles just fine though, as array has been initialised, but only with default values. Since the default value of a string is null, we get an exception at run-time. Anyone who's spent time here on Stack Overflow will know that this explicit/implicit initialisation inconsistency leads to a great many "Why am I getting a “Object reference not set to an instance of an object” error?" questions.

9
Yuval Itzchakov On

When I create the same bool within my method, bool check(instead of within the class), i get an error "use of unassigned local variable check". Why?

Because the compiler is trying to prevent you from making a mistake.

Does initializing your variable to false change anything in this particular path of execution? Probably not, considering default(bool) is false anyway, but it is forcing you to be aware that this is happening. The .NET environment prevents you from accessing "garbage memory", since it will initialize any value to their default. But still, imagine this was a reference type, and you'd pass an uninitialized (null) value to a method expecting a non-null, and get a NRE at runtime. The compiler is simply trying to prevent that, accepting the fact that this may sometimes result in bool b = false statements.

Eric Lippert talks about this in a blog post:

The reason why we want to make this illegal is not, as many people believe, because the local variable is going to be initialized to garbage and we want to protect you from garbage. We do in fact automatically initialize locals to their default values. (Though the C and C++ programming languages do not, and will cheerfully allow you to read garbage from an uninitialized local.) Rather, it is because the existence of such a code path is probably a bug, and we want to throw you in the pit of quality; you should have to work hard to write that bug.

Why doesn't this apply to a class field? Well, I assume the line had to be drawn somewhere, and local variables initialization are a lot easier to diagnose and get right, as opposed to class fields. The compiler could do this, but think of all the possible checks it would need to be making (where some of them are independent of the class code itself) in order to evaluate if each field in a class is initialized. I am no compiler designer, but I am sure it would be definitely harder as there are plenty of cases that are taken into account, and has to be done in a timely fashion as well. For every feature you have to design, write, test and deploy and the value of implementing this as opposed to the effort put in would be non-worthy and complicated.

0
Reactgular On

Good answers above, but I thought I'd post a much simpler/shorter answer for people to lazy to read a long one (like me).

Class

class Foo {
    private string Boo;
    public Foo() { /** bla bla bla **/ }
    public string DoSomething() { return Boo; }
}

Property Boo may or may not have been initialized in the constructor. So when it finds return Boo; it doesn't assume that it's been initialized. It simply suppresses the error.

Function

public string Foo() {
   string Boo;
   return Boo; // triggers error
}

The { } characters define the scope of a block of code. The compiler walks the branches of these { } blocks keeping track of stuff. It can easily tell that Boo was not initialized. The error is then triggered.

Why does the error exist?

The error was introduced to reduce the number of lines of code required to make source code safe. Without the error the above would look like this.

public string Foo() {
   string Boo;
   /* bla bla bla */
   if(Boo == null) {
      return "";
   }
   return Boo;
}

From the manual:

The C# compiler does not allow the use of uninitialized variables. If the compiler detects the use of a variable that might not have been initialized, it generates compiler error CS0165. For more information, see Fields (C# Programming Guide). Note that this error is generated when the compiler encounters a construct that might result in the use of an unassigned variable, even if your particular code does not. This avoids the necessity of overly-complex rules for definite assignment.

Reference: https://msdn.microsoft.com/en-us/library/4y7h161d.aspx