How can I express the type of 'takeWhile for vectors'?

439 views Asked by At

Haskell beginner here. I've defined the following types:

data Nat = Z | S Nat
data Vector a n where
    Nil :: Vector a Z
    (:-) :: a -> Vector a n -> Vector a (S n)
infixl 5 :-

I'm trying to write the function, takeWhileVector which behaves the same as takeWhile does on lists, but on Vectors.

My implementation is as follows:

takeWhileVector :: (a -> Bool) -> Vector a n -> Vector a m
takeWhileVector f Nil = Nil
takeWhileVector f (x :- xs) = if (f x) then (x :- takeWhileVector f xs) else Nil

This doesn't compile and produces the following error:

Could not deduce (m ~ 'S n0)
from the context (n ~ 'S n1)
  bound by a pattern with constructor
             :- :: forall a (n :: Nat). a -> Vector a n -> Vector a ('S n),
           in an equation for ‘takeWhileVector’
  at takeWhileVector.hs:69:20-26
  ‘m’ is a rigid type variable bound by
      the type signature for
        takeWhileVector :: (a -> Bool) -> Vector a n -> Vector a m
      at takeWhileVector.hs:67:20
Expected type: Vector a m
  Actual type: Vector a ('S n0)
Relevant bindings include
  takeWhileVector :: (a -> Bool) -> Vector a n -> Vector a m
    (bound at takeWhileVector.hs:69:1)
In the expression: (x :- takeWhileVector f xs)
In an equation for ‘takeWhileVector’:
    takeWhileVector f (x :- xs) = (x :- takeWhileVector f xs)

I would have thought that my type definition for takeWhileVector says the following:

For all types 'a' and 'n's of kind Nat, this function returns a 'Vector a m', where 'm' is kind Nat.

Do I need to change the type signature of takeWhileVector to be more specific so that it shows that the result is Vector a (m :: Nat)? Otherwise, how can I change this to have it compile?

2

There are 2 answers

6
chi On BEST ANSWER

The type you suggest can not be implemented, i.e. it is not inhabited:

takeWhileVector :: (a -> Bool) -> Vector a n -> Vector a m

Remember that the caller chooses the type variables a,n,m. This means that the caller can use your function as e.g.

takeWhileVector :: (a -> Bool) -> Vector a Z -> Vector a (S Z)

which is nonsense, since you can't produce some as if you got none at the beginning. Even more practically, the caller of this function is not meant to be able to choose how many results to have and pass a completely unrelated function -- that would disagree with the intended semantics of takeWhile.

I guess what you really mean is something like

takeWhileVector :: (a -> Bool) -> Vector a n -> exists m. Vector a m

except that exists is not valid Haskell syntax. Instead, you need something like

data SomeVector a where
   S :: Vector a m -> SomeVector a

takeWhileVector :: (a -> Bool) -> Vector a n -> SomeVector a
2
Benjamin Hodgson On

There's no way to statically know the length of the Vec that is returned by takeWhileVec: it depends on the values contained in the Vec, and the function you use. However, what I can do is give you a value which you can pattern-match on to learn the length of the Vec. Enter singleton values.

data Natty n where
    Zy :: Natty Z  -- pronounced 'zeddy', because I'm a limey
    Sy :: Natty n -> Natty (S n)  -- pronounced 'essy'

For a given n of kind Nat, there is exactly one (non-undefined) value of type Natty n. So if you know something about a Natty, you also know something about its associated type-level Nat.*

Why can't we just use Nat for this purpose? Haskell maintains a separation between values and types. The type-level Nat has no relation, other than a structural similarity, to the value-level Nat: the value S Z has a type of Nat, not S' Z' - so we have to use Natty, a second copy of Nat, to manually tie values and types together. (Real dependently-typed systems such as Agda allow you to reuse the same Nat for both value-level and type-level computations.)

*You can propagate knowledge the other way too, using type classes.


The plan is to return the output vector as well as its length as a Natty, with the types arranged in such a way that GHC understands that the Natty does indeed represent its length. You might first consider this variation on the example in your question:

-- takeWhile :: (a -> Bool) -> Vec a n -> (Natty m, Vec a m)

But this doesn't say the right thing: we're saying takeWhile can return any m of the caller's choice, which isn't true! It can only return the unique m determined by the function and the input vector. As I mentioned, this is un-knowable at compile-time, so we have to keep the length a secret from the compiler.

data AVec a = forall n. AVec (Natty n) (Vec a n)

AVec is an existential type: n appears on the right-hand side but not the left. This technique allows you to emulate a dependent pair type: the type of the Vec depends on the value of the Natty. Dependent pairs are useful whenever the static properties of data depend on dynamic information not available at compile-time.

We can write takeWhile straightforwardly now:

takeWhile :: (a -> Bool) -> Vec a n -> AVec a
takeWhile f Nil = AVec Zy Nil
takeWhile f (x :- xs) = case takeWhile f xs of
                             AVec n ys -> if f x
                                          then AVec (Sy n) (x :- ys)
                                          else AVec Zy Nil

We had to throw away static knowledge of the Vector's length, so how do you use an AVec with a function which places static requirements on the length? Because of the way AVec is constructed, we know that the Natty in the left slot represents the length of the vector in the right slot: they both feature the same (existentially hidden) type parameter n. So, by pattern-matching on the Natty, we learn the length of the Vec.

head :: Vec a (S n) -> a
head (x :- xs) = x

head' :: AVec a -> Maybe a
head' (AVec Zy Nil) = Nothing
head' (AVec (Sy n) xs) = Just (head xs)

In this example, we only care whether the vector is longer than one, so pattern matching on Sy is enough to prove to GHC that we should be allowed to use head. (See a related answer of mine for a more involved example of proving facts about AVecs.)


@chi mentioned a tantalising idea: you might want to show that the vector returned by takeWhile is not longer than the input vector.

takeWhile :: (a -> Bool) -> Vec a n -> AShorterVec a n

where AShorterVec is the type of vectors no longer than n. Our challenge is to write down the definition of AShorterVec.

Given two natural numbers, how can you be sure that the first is less-than-or-equal to the second? The relation is inductive. If the left operand is zero, then it's less-than-or-equal to any natural number automatically. Otherwise, a number is less-than-or-equal to another if its predecessor is less-than-or-equal to the other's predecessor. We can encode that as a constructive proof using a GADT.

data LTE n m where
    ZLte :: LTE Z m
    SLte :: LTE n m -> LTE (S n) (S m)

If n is less than m, you'll be able to come up with a value of LTE n m. If it's not, you can't. This is what the Curry-Howard isomorphism is all about.

Now we're ready to define AShorterVec: in order to construct a value of AShorterVec a n you need to be able to prove that it's shorter than n by supplying a value of LTE m n. When you pattern-match on the AShorterVec constructor, it gives you the proof back so you can compute with it.

data AShorterVec a n = forall m. AShorterVec (LTE m n) (Vec a m)

takeWhile compiles with only a minor change: we have to manipulate this proof object.

takeWhile :: (a -> Bool) -> Vec a n -> AShorterVec a n
takeWhile f Nil = AShorterVec ZLte Nil
takeWhile f (x :- xs) = case takeWhile f xs of
                             AShorterVec prf ys -> if f x
                                                   then AShorterVec (SLte prf) (x :- ys)
                                                   else AShorterVec ZLte Nil

An alternative way to give a type to takeWhile is to push the upper bound on the length into the return type itself, rather than carrying it around as data. This approach dispenses with any wranglings with Natty, proof terms like LTE, and existential quantifications.

data ShorterVec a n where
    SNil :: ShorterVec a n
    SCons :: a -> ShorterVec a n -> ShorterVec a (S n)

Once again, ShorterVec a n denotes the set of vectors no longer than n. The structure of ShorterVec is reminiscent of the finite sets, but translated from the world of naturals into the world of vectors. An empty vector is shorter than any length you care to name; a cons cell increases the smallest valid upper bound by one. Note that the value of n is never fully determined by a value of type ShorterVec, so you can give any valid upper bound to a ShorterVec. These two expressions are both well-typed:

ok1 = SCons 1 (SCons 3 SNil) :: ShorterVec Int (S (S (S Z)))
ok2 = SCons 1 (SCons 3 SNil) :: ShorterVec Int (S (S Z))

but this one is not:

-- notOk = SCons 1 (SCons 3 SNil) :: ShorterVec Int (S Z)  -- the vector is longer than our stated upper bound.

Using this datatype, you can write a beautifully simple version of takeWhile which looks exactly like the original list version:

takeWhile :: (a -> Bool) -> Vec a n -> ShorterVec a n
takeWhile f Nil = SNil
takeWhile f (x :- xs) = if f x
                        then SCons x (takeWhile f xs)
                        else SNil

Relaxing our assumptions about the type has made the function simpler to implement, but you pay the cost of having another type which you need to convert to and from. You can translate from ShorterVec back into the version which used a dependent pair by measuring the length.

toAVec :: ShorterVec a n -> AVec a
toAVec SNil = AVec Zy Nil
toAVec (SCons x xs) = case toAVec xs of
                           AVec n ys -> AVec (Sy n) (x :- ys)

We started out using singletons to tie types and values together, and existential types to wrap up runtime information about data with the data itself. Then, following @chi's idea, we encoded (one part of) the correctness of takeWhile into the type signature using a proof-term. Then we came up with a way to bake the length invariant into the return type directly, dispensing with the need to prove any theorems.

Once you've got the taste of dependently-typed programming on your tongue, it's hard to go back to the old way. Expressive type systems give you at least three advantages: you can write valid programs which other languages would disallow (or force you to duplicate code); you can write more meaningful types for the same functions, making stronger promises; and you can prove the correctness of your programs in a machine-checkable way.

Haskell's not set up for it, though. One problem is that singletons make binding types and values needlessly complex: the Nat-Natty distinction causes an explosion of boilerplate code, most of which I've spared you, to shuffle between the types and values. (Much of this boilerplate can be automated - that's what the singletons library gives you.) If we wanted to specify another aspect of the correctness of takeWhile - the fact that all of the output list's elements satisfy the predicate - we'd have to work exclusively with lists of singletons and type-level predicate functions. I also find it tedious to declare a new top-level type every time I want to quantify something existentially (you can write libraries to help with this, though they often result in other boilerplate) - we have the lack of type-level lambdas to thank for that. The other main difficulty is restrictions in what can be promoted to the type-level using DataKinds: GADTs and existential types can't be promoted, so for example you can't have a multi-dimensional array whose shape is expressed statically as a Vec Nat n. No real dependently typed system makes dependent types this difficult to use!