How do compilers store hundreds of variables in only a few registers?

530 views Asked by At

Say you have a virtual machine that only has 4 registers, A, B, C, and D. How do compilers store so many variables with only a limited amount of space?

Are there multiple ways of doing this, or is there a single solid way that this is accomplished? What's the fancy scientific-term for this, and also is it considered a complicated problem?

3

There are 3 answers

1
Jens On BEST ANSWER

I recommend you read Programming Language Pragmatics or the Dragon Books, in particular the chapters on register allocation.

In short, there are many approaches to handling this situation. Commonly the compiler builds an intermediate representation which can be an abstract machine with an infinite number of registers or an SSA form. When code is generated for a particular target hardware/OS, these abstract registers are assigned to real registers or to stack locations, based on criteria like use frequency or life span of the abstract registers (i.e. your original variables).

There are different approaches depending on the chosen intermediate representation (see for example here or here). The problem can be difficult if you are striving for an optimal solution (i.e. keep as many variables for as long as possible in actual registers without spilling them onto the stack), but there are simpler approaches like "linear scan register allocation" when time is critical e.g. in just-in-time compilations.

If you'd like to dig into code, perhaps take a look at the LLVM infrastructure and their register allocation and this presentation.

0
amalloy On

Stuff that doesn't fit in registers (which is most stuff) is stored in memory, and moved into a register only when there's a need to do operations on it. You probably want to read the Wikipedia article on register allocation, which is exactly the name for the thing you are asking about.

0
Matt On

This is the subject of register allocation. Basically what is done is the compiler computes for each variable, what other variables are in use at the same time. The compiler will then construct an interference graph, where there is a node for each variable used in the program, and an edge between all nodes that are live at the same time. This then becomes a graph coloring problem, where the colors correspond to the registers available on the machine.

As you may know, graph coloring is an NP-Complete problem, so compilers implement a simple, yet very effective heuristic. Basically, they find the highest degree node in the graph that has fewer than k edges, where k is the number of registers on the machine. We then remove this node along with all of its edges and recursively color the remaining graph. If no such node exists, we take the highest degree node, and spill it, meaning we store it on the stack instead, and retry the coloring process with that node removed.