I have general question how should I go about and create proper macroexpand function or macro.

This is definition of my macro in LIPS interpreter (you can test it here https://jcubic.github.io/lips/)

function macro_expand(single) {
    return async function(code, args) {
        var env = args['env'] = this;
        async function traverse(node) {
            if (node instanceof Pair && node.car instanceof Symbol) {
                try {
                    var value = env.get(node.car);
                    if (value instanceof Macro && value.defmacro) {
                        var result = await value.invoke(node.cdr, args, true);
                        if (result instanceof Pair) {
                            return result;
                        }
                    }
                } catch (e) {
                    // ignore variables
                }
            }
            var car = node.car;
            if (car instanceof Pair) {
                car = await traverse(car);
            }
            var cdr = node.cdr;
            if (cdr instanceof Pair) {
                cdr = await traverse(cdr);
            }
            var pair = new Pair(car, cdr);
            return pair;
        }
        var new_code = code;
        if (single) {
            return quote((await traverse(code)).car);
        } else {
            while (true) {
                new_code = await traverse(code);
                if (code.toString() === new_code.toString()) {
                    break;
                }
                code = new_code;
            }
            return quote(new_code.car);
        }
    };
}

The problem is that this is that it's dummy macroexpand and ignore errors about variables, so it can't evaluate macro quasiquote, because it throw exception that it can't find variables. So I end up with quasiquote inside my expanded list (NOTE: the code in latest version don't even try to expand quasiquote because it's marked as not expandable).

What is approach of writing macro expand? Should I extend the evaluate function to work differently when using macro expand function?

I was testing how biwascheme create this function, https://www.biwascheme.org/ but it also don't work quite as I expect this function to work:

it expand:

biwascheme> (define-macro (foo name . body) `(let ((x ,(symbol->string name))) `(print ,x)))
biwascheme> (macroexpand '(foo bar))
=> ((lambda (x) (cons (quote print) (cons x (quote ())))) "bar")
biwascheme> 

I expect it to expand to:

(let ((x "bar")) (quasiquote (print (unquote x))))

My lisp return:

lips> (define-macro (foo name . body)
          `(let ((x ,(symbol->string name))) `(print ,x)))
;; macroexpand is a macro
lips> (macroexpand (foo bar))
(quasiquote (let ((x (unquote (symbol->string name))))
              (quasiquote (print (unquote x)))))

Even when I've set quasiquote as expandable it don't expand quasiquote because it can't find names so it throw exception that is ignored by macroexpand.

Any code even pseudo code will be helpful in writing this function or macro in my LISP.

EDIT:

I've started to updating my code to incorporate macro expand in evaluate function and made one change in define-macro macro. When macro expand was called it was not invoking the code first time, that was the problem.:

Before:

var rest = __doc__ ? macro.cdr.cdr : macro.cdr;
if (macro_expand) {
    return rest.car;
}
var pair = rest.reduce(function(result, node) {
    return evaluate(node, { env, dynamic_scope, error });
});

after:

var rest = __doc__ ? macro.cdr.cdr : macro.cdr;
var pair = rest.reduce(function(result, node) {
    return evaluate(node, eval_args);
});
if (macro_expand) {
    return quote(pair);
}

And it now working without problems, so my expand_macro macro is working correctly, and this is how you should write macro_expand.

EDIT2: I've further refactored the code and it turns out that I don't need to macro_exapnd code inside define-macro macro and only unquote the pair (delete data flag).

1 Answers

0
tfb On

Here is a toy macroexpander written in Racket which deals with CL-style macros. I have used Racket macros and other facilities in the process of writing this so it is not bootstrapped in terms of itself. Obviously such a thing can be done, but it would be significantly more hairy to do so.

The purpose of this is solely to demonstrate how a simple macroexpander might work: it is not in any sense meant to be something suitable for real use.

Special forms

First of all we need to deal with special forms. Special forms are things which have magic semantics. This expander has a very simple notion of how they work:

  • a special form is a compound form whose first element is a special operator;
  • each element of the rest of the form is either some kind of special thing which is not expanded, or it's expanded normally, which is done by saying expr in the definition;
  • the way this is done is by a fairly mindless pattern matcher, which is probably adequate only because there are a minute number of special forms the expander knows about.

So here is how special forms are defined, and the definition of three of them:

(define special-patterns (make-hasheqv))

(define (special-pattern? op)
  (and (symbol? op)
       (hash-has-key? special-patterns op)))

(define (special-pattern op)
  (hash-ref special-patterns op))

(define-syntax-rule (define-special-pattern (op spec ...))
  (hash-set! special-patterns 'op '(op spec ...)))

(define-special-pattern (quote thing))
(define-special-pattern (lambda args expr ...))
(define-special-pattern (define thing expr ...))
(define-special-pattern (set! thing expr))

Now we can ask if something is a special form (special pattern in the code) and retrieve its pattern:

> (special-pattern? 'lambda)
#t
> (special-pattern 'lambda)
'(lambda args expr ...)

Note that things like if are not special operators to the macro expander, even though they are in fact special: in a form like (if test then else) all of the subforms should be expanded, so there's no reason for the macroexpander to know about them. It's only things like lambda where some subforms should not be expanded that the macroexpander needs to know about.

Macro definitions

Macros are compound forms whose first element is recognised as naming a macro. For each such macro there is a macro expander function which is going to be responsible for expanding the form: that function is passed the whole form. There is a bit of syntax, which is define-macro, which wraps this function in a similar way that defmacro does in CL (but there is no &whole support, or support for arglist destructuring or any of that).

(define macros (make-hasheqv))

(define (macro? op)
  (and (symbol? op)
       (hash-has-key? macros op)))

(define (macro op)
  (hash-ref macros op))

(define-syntax-rule (define-macro (m arg ... . tail) form ...)
  (hash-set! macros 'm (lambda (whole)
                         (apply (lambda (arg ... . tail) form ...)
                                (rest whole)))))

With this we can define a simple macro: here are four definitions for let.

First of all here is the most rudimentary one: this doesn't even use define-macro but is what it turns into: the outer function gets the whole form, and then calls the inner one on the bit of it which is not the macro name. The inner function then laboriously turns (let ((x y) ...) ...) into ((lambda (x ...) ...) y ...), which is the right expansion for let. (Note that none of this deals with CL's (let (x) ...)).

(hash-set! macros 'let
           ;; this is what define-macro turns into
           (lambda (whole)
             (apply (lambda (bindings . body)
                      (cons (cons 'lambda
                                  (cons (map first bindings) body))
                            (map second bindings)))
                    (rest whole))))

Now here is that but using define-macro to reduce the pain:

(define-macro (let bindings . body)
  ;; Really primitive version
  (cons (cons 'lambda (cons (map first bindings) body))
        (map second bindings)))

And another version using list* to make things a little less horrible:

(define-macro (let bindings . body)
  ;; without backquote, but usung list* to make it a bit
  ;; less painful
  (list* (list* 'lambda (map first bindings) body)
         (map second bindings)))

And finally a version using backquote (aka quasiquote).

(define-macro (let bindings . body)
  ;; with backquote
  `((lambda ,(map first bindings) ,@body)
    ,@(map second bindings)))

Here is a version of a macro definition for prog1 which is broken due to hygiene failure:

(define-macro (prog1 form . forms)
  ;; Broken
  `(let ([r ,form])
     ,@forms
     r))

And here is how you need to write it to be more hygienic (although it is still unhygienic by Scheme's somewhat extreme standards):

(define-macro (prog1 form . forms)
  ;; Working
  (let ([rn (string->uninterned-symbol "r")])
    `(let ([,rn ,form])
       ,@forms
       ,rn)))

Note that this macro turns into another macro: it expands to let: the expander needs to deal with this (and it does).

The macro expander

The macro expander consists of two functions: expand-macros is the thing that actually does the expansion, and it dispatches to expand-special for special forms.

Here is expand-macros:

(define (expand-macros form)
  ;; expanding a form
  (if (cons? form)
      ;; only compound forms are even considered
      (let ([op (first form)])
        (cond [(macro? op)
               ;; it's a macro: call the macro function & recurse on the result
               (expand-macros ((macro op) form))]
              [(special-pattern? op)
               ;; it's special: use the special expander
               (expand-special form)]
              [else
               ;; just expand every element.
               (map expand-macros form)]))
      form))

Notes about this:

  • only compound forms can be macro forms;
  • this is a lisp-1, so the cars of compound forms are evaluated completely normally and can be macro forms: ((let (...) ...) ...) is fine;
  • macros are expanded recursively until there is nothing left to do.

Here is expand-special: this is much fiddlier than expand-macro and probably buggy: what it's trying to do is match the definition of a special form against the form it's been given.

(define (expand-special form)
  ;; expand a special thing based on a pattern.
  (match-let* ([(cons op body) form]
               [(cons pop pbody) (special-pattern op)])
    (unless (eqv? op pop)
      (error 'expand-special "~s is not ~s" pop op))
    (let pattern-loop ([accum (list op)]
                       [tail body]
                       [ptail pbody]
                       [context 'expr])
      (cond [(null? tail)
             (unless (or (null? ptail)
                         (eqv? (first ptail) '...))
               (error 'expand-special "~s is not enough forms for ~s"
                      body op))
             (reverse accum)]
            [(null? ptail)
             (error 'expand-special "~s is too many forms for ~s"
                    body op)]
            [else
             (match-let* ([(cons btf btr) tail]
                          [(cons ptf ptr) ptail]
                          [ellipsis? (eqv? ptf '...)]
                          [ctx (if ellipsis? context ptf)]
                          [ptt (if ellipsis? ptail ptr)])
               (pattern-loop (cons (if (eqv? ctx 'expr)
                                       (expand-macros btf)
                                       btf)
                                   accum)
                             btr ptt ctx))]))))

The fiddly bit here is the handling of ellipsis (...) which are used in the matcher to indicate 'more stuff here': I can't remember whether it can deal with ellipsis which are not the last thing in a pattern but I suspect strongly not. Note that although the underlying macro system also uses ellipsis these are unrelated: this is just relying on the fact that ... is a legal symbol name.

Note also that this recurses back into expand-macros where needed, of course.

Given these definitions we can now expand some macros:

> (expand-macros '(let ((x y)) x))
'((lambda (x) x) y)
> (expand-macros '(prog1 a b))
'((lambda (r) b r) a)

Note that Racket's printer doesn't print uninterned specially but the r above is uninterned.

With a simple tracing utility you can define a traced version of the macroexpander:

> (expand-macros '(let ([x 1]) (prog1 x (display "1"))))
[expand-macros (let ((x 1)) (prog1 x (display "1")))
 [expand-macros ((lambda (x) (prog1 x (display "1"))) 1)
  [expand-macros (lambda (x) (prog1 x (display "1")))
   [expand-special (lambda (x) (prog1 x (display "1")))
    [expand-macros (prog1 x (display "1"))
     [expand-macros (let ((r x)) (display "1") r)
      [expand-macros ((lambda (r) (display "1") r) x)
       [expand-macros (lambda (r) (display "1") r)
        [expand-special (lambda (r) (display "1") r)
         [expand-macros (display "1")
          [expand-macros display
           -> display]
          [expand-macros "1"
           -> "1"]
          -> (display "1")]
         [expand-macros r
          -> r]
         -> (lambda (r) (display "1") r)]
        -> (lambda (r) (display "1") r)]
       [expand-macros x
        -> x]
       -> ((lambda (r) (display "1") r) x)]
      -> ((lambda (r) (display "1") r) x)]
     -> ((lambda (r) (display "1") r) x)]
    -> (lambda (x) ((lambda (r) (display "1") r) x))]
   -> (lambda (x) ((lambda (r) (display "1") r) x))]
  [expand-macros 1
   -> 1]
  -> ((lambda (x) ((lambda (r) (display "1") r) x)) 1)]
 -> ((lambda (x) ((lambda (r) (display "1") r) x)) 1)]
'((lambda (x) ((lambda (r) (display "1") r) x)) 1)

A version of this code is available here.