I'm following "Implementing functional languages: a tutorial" by SPJ, and I'm stuck on Exercise 2.18 (page 70), reproduced below. This is in the chapter about a template-instantiation evaluator for the simple lazy functional language described in the book (similar to a mini Miranda/Haskell):
Exercise 2.18. Why is it hard to introduce
case
expressions into the template instantiation machine? (Hint: think about whatinstantiate
would do with acase
expression.)
The tutorial then goes on to cover an implementation of several less-general versions of destructuring structured data: an if
primitive, a casePair
primitive, and a caseList
primitive. I haven't yet done the implementation for this section (Chapter 2 Mark 5), but I don't see why implementing these separately would be significantly easier than implementing a single case
primitive.
The only plausible explanations I can offer is that the most generic case
form is variadic in both number of alternatives (number of tags to match against) and arity (number of arguments to the structured data). All of the above primitives are fixed-arity and have a known number of alternatives. I don't see why this would make implementation significantly more difficult, however.
The instantiation of the case
statement is straightforward:
- Instantiate the scrutinee.
Instantiate the body expression of each alternative. (This may be somewhat wasteful if we substitute in unevaluated branches.)(I notice now this may be a problem, will post in an answer.)- Encapsulate the result in a new node type,
NCase
, where:data Node = NAp Addr Addr | ... | NCase [(Int, [Name], Addr)]
Operationally, the reduction of the case
statement is straightforward.
- Check if the argument is evaluated.
- If not, make it the new stack and push the current stack to the dump. (Similar to evaluating the argument of any primitive.)
- If the argument is evaluated, then search for an alternative with a matching tag.
- If no alternative with a matching tag is found, then throw an error (inexhaustive case branches).
- Instantiate the body of the matching alternative with the environment augmented with the structured data arguments. (E.g., in
case Pack {0, 2} 3 4 in <0> a b -> a + b
, instantiatea + b
with environment[a <- 3, b <- 4]
)
A new node type would likely have to be introduced for case (NCase
) containing the list of alternatives, but that's not too dissuading.
I found a GitHub repository @bollu/timi which seems to implement a template-instantiation evaluator also following this tutorial. There is a section called "Lack of lambda and case", which attributes the lack of a generic case
statement to the following reason:
Case
requires us to have some notion of pattern matching / destructuring which is not present in this machine.
However, in this tutorial there is no notion of pattern-matching; we would simply be matching by tag number (an integer), so I'm not sure if this explanation is valid.
Aside, partly for myself: a very similar question was asked about special treatment for case
statements in the next chapter of the tutorial (concerning G-machines rather than template-instantiation).
I think I figured it out while I was expanding on my reasoning in the question. I'll post here for posterity, but if someone has a more understandable explanation or is able to correct me I'll be happy to accept it.
The difficulty lies in the fact that the
instantiate
step performs all of the variable substitutions, and this happens separately from evaluation (thestep
function). The problem is as bollu says in the GitHub repository linked in the original question: it is not easy to destructure structured data at instantiation time. This makes it difficult to instantiate the bodies of all of the alternatives.To illustrate this, consider the instantiation of
let
expressions. This works like so:However, now consider the case of
case
expressions. What we want to do is:Pack {m, n} a0 a1 ... an
)<m> b0 b1 ... bn -> body
), augment the environment with the new bindings ([b0 <- a0, b1 <- a1, ..., bn <- an]
and then instantiate the body of the alternative.)The problem lies somewhere in between the two steps: calling
instantiate
on the scrutinee results in the instantiatedAddr
, but we don't readily have access toa1
,a2
, ...an
to augment the environment with at instantiation time. While this might be possible if the scrutinee was a literalPack
value, if it needed further evaluation (e.g., was the evaluated result of a call to a supercombinator) then we would need to first evaluate it.To solidify my own understanding, I'd like to answer the additional question: How do the primitives
if
,casePair
, andcaseList
avoid this problem?if
trivially avoids this problem because boolean values are nullary.casePair
andcaseList
avoid this problem by deferring the variable bindings using thunk(s); the body expressions get instantiated once the thunk is called, which is after the scrutinee is evaluated.Possible solutions:
I'm thinking that it might be possible to get around this if we define a destructuring primitive operator on structured data objects. I.e.,
(Pack {m, n} a0 a1 ... an).3
would evaluate toa3
.In this case, what we could do is call
instantiate scrut
which would give us the addressscrutAddr
, and we could then augment the environment with new bindings[b0 <- (NAp .0 scrut), b1 <- (NAp .1 scrut), ..., bn <- (NAp .n scrut)]
.The issue seems to lie in the fact that instantiation (substitution) and evaluation are separated. If variables were not instantiated separately from evaluation but rather added to/looked up from the environment upon binding/usage, then this would not be a problem. This is as if we placed the bodies of the case statements into thunks to be instantiated after the scrutinee is evaluated, which is similar to what
casePair
andcaseList
do.I haven't worked through either of these alternate solutions or how much extra work they would incur.