How would you abstract away the boilerplate in this pair of "similar shaped" datatypes

266 views Asked by At

General Question

I have a pair of datatypes that are two different ways of representing the same thing, one records the variable name in String, while the other one records the variable name in Int. Currently, they're both defined. However, I would like to describe just first representation, and the second one would be generated by some relation.

Specific Example

Concretely, the first one is the user defined version of a STLC term universe, while the second one is the de Bruijn–indexed version of the same thing:

{-# LANGUAGE RankNTypes, GADTs, DataKinds, TypeOperators, KindSignatures, PolyKinds, TypeFamilies, UndecidableInstances, FunctionalDependencies, FlexibleInstances  #-} 


-- * Universe of Terms * -- 

data Term :: Type -> * where 
    Var :: Id -> Term a
    Lam :: Id -> Type -> Term b -> Term (a :-> b)
    App :: Term (a :-> b) -> Term a -> Term b 

    Let :: Id -> Term a -> Term b -> Term b 
    Tup :: Term a -> Term b -> Term (a :*: b)
    Lft :: Term a -> Term (a :+: b)
    Rgt :: Term b -> Term (a :+: b)

    Tru :: Term Boolean
    Fls :: Term Boolean
    Uni :: Term Unit

data TermIn :: Type -> * where 
    VarI :: Idx -> Info -> TermIn a 
    LamI :: Type -> TermIn b -> Info -> TermIn (a :-> b)
    AppI :: TermIn (a :-> b) -> TermIn a -> TermIn b

    LetI :: TermIn a -> TermIn b -> Info -> TermIn b   
    TupI :: TermIn a -> TermIn b -> TermIn (a :*: b)

    TruI :: TermIn Boolean
    FlsI :: TermIn Boolean
    UniI :: TermIn Unit

-- * Universe of Types * --

data Type 
    = Type :-> Type
    | Type :*: Type
    | Type :+: Type
    | Boolean
    | Unit
    | Void

-- * Synonyms * --

type Id   = String          -- * variable name
type Idx  = Int             -- * de-brujin's index
data Info = Rec Id String   -- * store original name and extra info

There is already a relationship defined over Term and TermIn:

    class DeBruijnPair a b | a -> b, b -> a where 
        decode  :: b -> a
        encode  :: a -> b

    instance DeBruijnPair (Term a) (TermIn a) where
        decode = undefined
        encode = undefined 

Please note since the original name of Term is stored in TermIn, there's a one-to-one mapping of Term to TermIn, and back.

Restate the Question

Now you can see how much boiler plate is involved, which I'd like to get rid of using some elegant abstraction where I define Term and some function over types outputs TermIn. Just to provide even more context, I'm creating many pairs of such external and de Bruijn representations for different lambda-calculus formulations, and the one-to-one relation holds for all of them.

1

There are 1 answers

4
Twey On BEST ANSWER

The first step is to separate out the parts that are specific to each representation (Var, Lam, Let) from the rest of the definitions.

data AbstractTerm ∷ Type → ★ where
    App ∷ AbstractTerm (a :-> b) → AbstractTerm a → AbstractTerm b

    Tup ∷ AbstractTerm a → AbstractTerm b → AbstractTerm (a :*: b)

    Lft ∷ AbstractTerm a → AbstractTerm (a :+: b)
    Rgt ∷ AbstractTerm b → AbstractTerm (a :+: b)

    Tru, Fls ∷ AbstractTerm Boolean

    Uni ∷ AbstractTerm Unit

data Term ∷ Type → ★ where 
    Var ∷ Id → Term a
    Lam ∷ Id → Type → Term b → Term (a :-> b) 
    Let ∷ Id → Term a → Term b → Term b 

data TermIn ∷ Type → ★ where 
    VarI ∷ Idx → Info → TermIn a 
    LamI ∷ Type → TermIn b → Info → TermIn (a :-> b)
    LetI ∷ TermIn a → TermIn b → Info → TermIn b   

Then, you need to combine the ‘generic’ part with the concrete representation you want. There's a well-known trick to building inductive definitions into smaller chunks: you refactor the inductive call out of the data-type, making the types of sub-elements a parameter to the type (in this case, a function over your Type kind, since you need to keep track of the object-language type).

data AbstractTerm (t ∷ Type → ★) ∷ Type → ★ where
    App ∷ t (a :-> b) → t a → AbstractTerm t b

    Tup ∷ t a → t b → AbstractTerm t (a :*: b)

    Lft ∷ t a → AbstractTerm t (a :+: b)
    Rgt ∷ t b → AbstractTerm t (a :+: b)

    Tru, Fls ∷ AbstractTerm t Boolean

    Uni ∷ AbstractTerm t Unit

data Term (t ∷ Type → ★) ∷ Type → ★ where 
    Var ∷ Id → Term t a
    Lam ∷ Id → Type → t b → Term t (a :-> b)
    Let ∷ Id → t a → t b → Term t b 

data TermIn (t ∷ Type → ★) ∷ Type → ★ where 
    VarI ∷ Idx → Info → TermIn t a 
    LamI ∷ Type → t b → Info → TermIn t (a :-> b)
    LetI ∷ t a → t b → Info → TermIn t b   

To instantiate this type, you can use an inductive type definition that sums itself with the abstract type to obtain the parameter to give to the abstract type.

newtype ConcreteTermNamed ty
  = ConcreteTermNamed (Either (Term ConcreteTermNamed ty)
                              (AbstractTerm ConcreteTermNamed ty))

newtype ConcreteTermNameless ty
  = ConcreteTermNameless (Either (TermIn ConcreteTermNameless ty)
                                 (AbstractTerm ConcreteTermNameless ty))

This introduces a little additional noise to choose whether you want a representation-agnostic or -specific term at each level, plus the Haskell-mandated newtype wrapper, but otherwise leaves your term language as it was.

var ∷ ConcreteTermNamed (Boolean :*: Unit)
var = ConcreteTermNamed
        (Right (Tup (ConcreteTermNamed (Left (Var "x")))
                    (ConcreteTermNamed (Left (Var "y")))))

fun ∷ ConcreteTermNamed (Boolean :-> (Unit :*: Boolean))
fun = ConcreteTermNamed (Left
        (Lam "b" Boolean
           (ConcreteTermNamed (Right
              (Tup (ConcreteTermNamed (Right Uni))
                   (ConcreteTermNamed (Left (Var "b"))))))))

This trick can be used to sum any number of different inductive types, and is often used to break down a larger language into smaller, more modular sub-languages; for example, it might be good style to split your AbstractTerm up further into application, product, sum, and unit types, and then merge them all together by adding them into the sum type.