This question is about designing abstractions around the Trees That Grow idiom.
Suppose I have a very simple syntax tree:
data Expr χ
= Lam (Lamfam χ) Name (Ty χ) (Expr χ)
| App (Appfam χ) (Expr χ)
| Unit (Ufam χ)
data Ty χ
= UnitTy (Utyfam χ)
| Arrow (Arrfam χ) (Ty χ) (Ty χ)
type family Lamfam χ
type family Appfam χ
type family Ufam χ
type family Utyfam χ
type family Arrfam χ
My question arises in looking for a way to place constraints on this set of type families. The paper provides one solution: constraint kinds:
type ForAllχ (c :: * -> Constraint) χ
= (c (LamFam χ), c (Appfam χ), c (Ufam χ), c (Utyfam χ), c (Arrfam χ))
Forallχ Show χ => Show (Ty χ) where
show = ...
Forallχ Show χ => Show (Exp χ)
show = ...
However, this only works if I want to place the same constraint on all types.
Suppose, for example, that I'm writing a bidirectional type checker algorithm which is generic over parameter χ.
infer env term = case term of
...
Lam ex name ty body -> do
ret_ty <- infer (insert name ty env) body
pure $ Arr ??? a ret_ty
...
We don't have anything to plug in where the ??? is. To solve this, maybe I want there to be a function lamToArr :: (Lamfam χ) -> (Arrfam χ) which allows the type to decide how extension information flows through the type checker.
infer lamtoArr env term = case term of
...
Lam ex name ty body -> do
ret_ty <- infer lamtoArr (insert name ty env) body
pure $ Arr lamtoArr a ret_ty
...
Of course, as add more functions this can get awkward, so ideally, I'd like to be able to use typeclasses - something like:
class Checker χ where
lamToArr :: (Lamfam χ) -> (Arrfam χ)
unitToTy :: (Ufam χ) -> (Utyfam χ)
However, this produces a type error:
• Couldn't match type: Lamfam χ0
with: Lamfam χ
Expected: Lamfam χ -> Arrfam χ
Actual: Lamfam χ0 -> Arrfam χ0
NB: ‘Lamfam’ is a non-injective type family
The type variable ‘χ0’ is ambiguous
• In the ambiguity check for ‘lamToArr’
To defer the ambiguity check to use sites, enable AllowAmbiguousTypes
When checking the class method:
lamToArr :: forall χ. Checker χ => Lamfam χ -> Arrfam χ
In the class declaration for ‘Checker’
|
41 | lamToArr :: (Lamfam χ) -> (Arrfam χ)
| ^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This error suggests that disabling the ambiguity check and/or making my type families injective (not quite sure what that means) could be a solution, but I wanted to check if any of this is a good solution to my problem in the first place, and if so then which of these two approaches is more what I'm looking for?
The general intent would be for the constraint do describe how information flows between types for a particular algorithm/stage, providing only what that particular stage needs. This would enable me to update or extend extensions to the tree without having to update the core algorithms.
In your case,
(Lamfam χ) -> (Arrfam χ)is ambiguous because there is no guarantee that eitherLamfam χorArrfam χis enough to tell whichχyou want at the call site. For example, a freshly parsed AST without annotations might set these both to somedata SrcLoc.There’s no problem with
AllowAmbiguousTypeshere. It says that you agree to write type annotations orTypeApplicationssometimes at call sites. You could make these type families injective usingTypeFamilyDependenciesannotations:Implicitly, a type family must be a function, whose input
χuniquely determines its outputλ. This just adds a dependency in the other direction, that for any outputλthat came from an application ofLamfam χ, it must be enough to tell what the inputχwas. This is a pretty strict requirement that negates some of the intended benefits of TTG (avoiding wrapper types). Also there are some expressiveness issues with injectivity checking in current GHC. Nevertheless, it does tend to make inference and error messages a bit more helpful when it works. A common idiom for making a family injective is to keep the input type around as a phantom type parameter. Basically, replace stuff like this:With stuff like this:
However, this is getting a bit astray. Normally with TTG, the typechecker wouldn’t be polymorphic in
χ, it would changeχfrom'Renamedto'Typecheckedby reannotating the tree. In that case, you have concrete types available, and you don’t need a typeclass.Your
class Checker χonly describes the set of AST states where you have enough information to run the typechecker. So I don’t know if it really pulls its weight as an abstraction, unless you want to be able to repeatedly re-typecheck the program between passes, say, to ensure nothing’s been broken by a rewrite.That said, there is one noteworthy solution in GHC for the problem of large sets of constraints, which you might find helpful. First, they start with a
Passtype to index the result of each pass.(There are only a few, because they only use TTG for the frontend right now, but if you’re writing a compiler using TTG from the get-go, you can certainly use it for more of the pipeline.)
Next, instead of indexing the AST and extension points with this type directly, like
HsExpr 'Parsed, they use a GADTGhcPass(likeSing Passwithsingletons).Finally they add a class
IsPass(likeSingIorKnownNat) which allows dispatching on the type in a uniform way.This is used especially for pretty-printing, and sometimes for constructing trees. It means that instead of carrying around a set of all the constraints they might need, like in
ForAllχ c χ, they can useIsPass p => HsExpr (GhcPass p). Withχprovided by a pattern-match, everyc χbecomes available. Notably, you can even use such constraints in an ad-hoc way, without any particular class:If you do want to write a lot of pass-polymorphic code, I find that it’s by far easier and clearer to use a data family or GADT in lieu of TTG. The compiler messages are much more helpful, and there are more tools available to reduce boilerplate, such as
StandaloneDeriving.