Difficulty of implementing `case` expressions in a template-instantiation evaluator for a lazy functional language?

124 views Asked by At

I'm following "Implementing functional languages: a tutorial" by SPJ, and I'm stuck on Exercise 2.18 (page 70), reproduced below. This is in the chapter about a template-instantiation evaluator for the simple lazy functional language described in the book (similar to a mini Miranda/Haskell):

Exercise 2.18. Why is it hard to introduce case expressions into the template instantiation machine? (Hint: think about what instantiate would do with a case expression.)

The tutorial then goes on to cover an implementation of several less-general versions of destructuring structured data: an if primitive, a casePair primitive, and a caseList primitive. I haven't yet done the implementation for this section (Chapter 2 Mark 5), but I don't see why implementing these separately would be significantly easier than implementing a single case primitive.

The only plausible explanations I can offer is that the most generic case form is variadic in both number of alternatives (number of tags to match against) and arity (number of arguments to the structured data). All of the above primitives are fixed-arity and have a known number of alternatives. I don't see why this would make implementation significantly more difficult, however.

The instantiation of the case statement is straightforward:

  1. Instantiate the scrutinee.
  2. Instantiate the body expression of each alternative. (This may be somewhat wasteful if we substitute in unevaluated branches.) (I notice now this may be a problem, will post in an answer.)
  3. Encapsulate the result in a new node type, NCase, where:
    data Node = NAp Addr Addr
              | ...
              | NCase [(Int, [Name], Addr)]
    

Operationally, the reduction of the case statement is straightforward.

  1. Check if the argument is evaluated.
    • If not, make it the new stack and push the current stack to the dump. (Similar to evaluating the argument of any primitive.)
  2. If the argument is evaluated, then search for an alternative with a matching tag.
    • If no alternative with a matching tag is found, then throw an error (inexhaustive case branches).
  3. Instantiate the body of the matching alternative with the environment augmented with the structured data arguments. (E.g., in case Pack {0, 2} 3 4 in <0> a b -> a + b, instantiate a + b with environment [a <- 3, b <- 4])

A new node type would likely have to be introduced for case (NCase) containing the list of alternatives, but that's not too dissuading.


I found a GitHub repository @bollu/timi which seems to implement a template-instantiation evaluator also following this tutorial. There is a section called "Lack of lambda and case", which attributes the lack of a generic case statement to the following reason:

Case requires us to have some notion of pattern matching / destructuring which is not present in this machine.

However, in this tutorial there is no notion of pattern-matching; we would simply be matching by tag number (an integer), so I'm not sure if this explanation is valid.


Aside, partly for myself: a very similar question was asked about special treatment for case statements in the next chapter of the tutorial (concerning G-machines rather than template-instantiation).

1

There are 1 answers

0
Jonathan Lam On

I think I figured it out while I was expanding on my reasoning in the question. I'll post here for posterity, but if someone has a more understandable explanation or is able to correct me I'll be happy to accept it.


The difficulty lies in the fact that the instantiate step performs all of the variable substitutions, and this happens separately from evaluation (the step function). The problem is as bollu says in the GitHub repository linked in the original question: it is not easy to destructure structured data at instantiation time. This makes it difficult to instantiate the bodies of all of the alternatives.

To illustrate this, consider the instantiation of let expressions. This works like so:

  1. Instantiate each new binding expression.
  2. Augment the current environment with the new bindings.
  3. Instantiate the body with the augmented expression.

However, now consider the case of case expressions. What we want to do is:

  1. Instantiate the scrutinee. (Which should eventually evaluate to the form Pack {m, n} a0 a1 ... an)
  2. For each alternative (each of which has the form <m> b0 b1 ... bn -> body), augment the environment with the new bindings ([b0 <- a0, b1 <- a1, ..., bn <- an] and then instantiate the body of the alternative.)

The problem lies somewhere in between the two steps: calling instantiate on the scrutinee results in the instantiated Addr, but we don't readily have access to a1, a2, ... an to augment the environment with at instantiation time. While this might be possible if the scrutinee was a literal Pack value, if it needed further evaluation (e.g., was the evaluated result of a call to a supercombinator) then we would need to first evaluate it.


To solidify my own understanding, I'd like to answer the additional question: How do the primitives if, casePair, and caseList avoid this problem?

if trivially avoids this problem because boolean values are nullary. casePair and caseList avoid this problem by deferring the variable bindings using thunk(s); the body expressions get instantiated once the thunk is called, which is after the scrutinee is evaluated.


Possible solutions:

  1. I'm thinking that it might be possible to get around this if we define a destructuring primitive operator on structured data objects. I.e., (Pack {m, n} a0 a1 ... an).3 would evaluate to a3.

    In this case, what we could do is call instantiate scrut which would give us the address scrutAddr, and we could then augment the environment with new bindings [b0 <- (NAp .0 scrut), b1 <- (NAp .1 scrut), ..., bn <- (NAp .n scrut)].

  2. The issue seems to lie in the fact that instantiation (substitution) and evaluation are separated. If variables were not instantiated separately from evaluation but rather added to/looked up from the environment upon binding/usage, then this would not be a problem. This is as if we placed the bodies of the case statements into thunks to be instantiated after the scrutinee is evaluated, which is similar to what casePair and caseList do.

I haven't worked through either of these alternate solutions or how much extra work they would incur.