Compiler: transformation of intermediate representation variables to native code

111 views Asked by At

On books, articles, slides and tutorials about intermediate representation used in compilers, the Three Address Code (TAC) is a common one. My question is about the following examples of TAC

t0 = a
t1 = a + b
a = t2

In this example, we have three lines with two variables: a and b; and three temporaries: t0, t1 and t2. When converting such TACs to MIPS assembly, for instance, the first and last one could easily be something like the following:

lw t0, sp, a.offset
sw t2, sp, a.offset

But I must admit that I have no idea on how to translate the middle TAC since MIPS (and many other RISC processors) doesn't have an instruction capable of fetching two memory operands at the same time.

So my questions are: (1) how one would translate such TAC to a RISC instruction and; (2) why such TAC is commonly used when so many processors nowadays are RISC based ones? Is it a legacy from the time when processors used to be more CISC based and allowed multiple fetchs from memory?

OR

Maybe I have a wrong interpretation of what variable means on such TACs. If so, how should I interpret such variables in TACs?

1

There are 1 answers

0
sepp2k On

how one would translate such TAC to a RISC instruction

Load a into t1 (or move t0 into t1), load b into some other register, and then add that other register onto t1.

I've assumed here that the temporaries are assigned to registers of the same name and the variables are kept in memory because that seems to be the assumption you made. You can't assume that in general (at least not the first part since there may be more temporaries than registers - it's also common to store variables in registers where possible, but of course you don't have to do that), but the question wasn't about register allocation, so I'll not go into this here.

why such TAC is commonly used when so many processors nowadays are RISC based ones?

The most common processor architectures today are x86, x64 and ARM, all of which can execute r1 = r2 + r3 in a single instruction, so your premise isn't entirely accurate. But even if it were, the primary goal of TAC isn't to have as direct a mapping to assembly as possible.

Instead the goal is to be a useful format for the optimizations and analyses that compilers commonly perform before finally generating the target code. Making the code more complicated by adding extra moves in the IR would not further that goal, so it isn't done.