Are regular haskell algebraic data types equivalent to context free grammars? What about GADTS?

1.2k views Asked by At

The syntax for algebraic data types is very similar to the syntax of Backus–Naur Form, which is used to describe context-free grammars. That got me thinking, if we think of the Haskell type checker as a parser for a language, represented as an algebraic data type (nularry type constructors representing the terminal symbols, for example), is the set of all languages accepted the same as the set of context free languages? Also, with this interpretation, what set of formal languages can GADTs accept?

1

There are 1 answers

1
chi On BEST ANSWER

First of all, data types do not always describe a set of strings (i.e., a language). That is, while a list type does, a tree type does not. One might counter that we could "flatten" the trees into lists and think of that as their language. Yet, what about data types like

data F = F Int (Int -> Int)

or, worse

data R = R (R -> Int)

?

Polynomial types (types without -> inside) roughly describe trees, which can be flattened (in-order visited), so let's use those as an example.

As you have observed, writing a CFG as a (polynomial) type is easy, since you can exploit recursion

data A = A1 Int A | A2 Int B
data B = B1 Int B Char | B2

above A expresses { Int^m Char^n | m>n }.

GADTs go much beyond context-free languages.

data Z
data S n 

data ListN a n where
  L1 :: ListN a Z
  L2 :: a -> ListN a n -> ListN a (S n)

data A
data B
data C

data ABC where
   ABC :: ListN A n -> ListN B n -> ListN C n -> ABC

above ABC expresses the (flattened) language A^n B^n C^n, which is not context-free.

You are pretty much unrestricted with GADTs, since it's easy to encode arithmetics with them. That is you can build a type Plus a b c which is non-empty iff c=a+b with Peano naturals. You can also build a type Halt n m which is non-empty iff the Turing machine m halts on input m. So, you can build a language

{ A^n B^m proof | n halts on m , and proof proves it }

which is recursive (and not in any simpler class, roughly).

At the moment, I do not know whether you can describe recursively enumerable (computably enumerable) languages in GADTs. Even in the halting problem example, I have to include the "proof" term inside the GADT to make it work.

Intuitively, if you have a string of length n and you want to check it a against a GADT, you can build all the GADT terms of depth n, flatten them, and then compare to the string. This should prove that such language is always recursive. However, existential types make this tree building approach quite tricky, so I do not have a definite answer right now.