Three-address code and symbol tables

827 views Asked by At

I am working on a hobby retargetable C compiler in OCaml and I'm building it bottom up. So far I have an annotated AST type, abridged:

type 'e expr =
    | Int of 'e * int
    | Var of 'e * var
    | Neg of 'e * 'e expr
    | Add of 'e * 'e expr * 'e expr
    | Sub of 'e * 'e expr * 'e expr

and a three-address code type (again abridged):

type node = Copy of location * location
          | Unary of location * unary_op * location
          | Binary of location * location * binary_op * location

and location = Temp of int | Int of int | Var of string

and unary_op = Neg

and binary_op = Add | Sub

I have functions written that will transform an AST into a list of TAC nodes ignoring the annotations. Regarding this I have two questions:

  1. What do I do differently when transforming a type-annotated AST to a list of TAC nodes? Should I add annotations to TAC nodes too? This would allow me to later transform high level int/char types into lower-level ones like I16/I8.

  2. How do I handle scoping? What if I have two Vars that have the same name in different scopes?

1

There are 1 answers

3
rici On BEST ANSWER

How you pass annotations to your TAC is a very open question, but I'd agree that you probably want to do so.

One approach to scoping is name erasure; as you resolve scopes, you replace each unique identifier with a unique "name" (or directly with a reference to the symbol table entry.) (This is sometimes called gensymming, after the traditional Lisp gensym function.) More formally, it is α-reduction, a term taken from the λ calculus. This will work for languages, like C, in which names are not available to the runtime.

Languages in which runtime introspection has access to names (Python, Javascript) make this process somewhat more complicated, but you can still associate each use of a name with a specific scope. In languages where scopes can be dynamic (Perl, Lisp), you would have to introduce a name resolution operation into your TAC.