What constitutes codata in the context of programming?

1.5k views Asked by At

This is a corecursive algorithm, because with each iteration it calls itself on data that is greater then what it had before:

iterate f x =  x : iterate f (f x)

It is similar to tail recursion accumulator style, but its accumulator is implicit instead of being passed as an argument. And it would be infinite if it weren't for lazyness. So is codata just the result of a value constructor in WHNF, kind of like (a, thunk)? Or is codata rather a mathematical term from category theory, which hasn't a useful representation in the programming domain?

Follow-up question: Is value recursion just a synonym for corecursion?

1

There are 1 answers

2
K. A. Buhr On BEST ANSWER

I think answering your questions requires a lot of explanation, so here's a big long answer with specific answers to your questions at the end.

Data and codata have formal mathematical definitions in terms of category theory, so it's not just a matter of how they are used in a program (i.e., not just the "application context" you mentioned in the comments). It may seem this way in Haskell because the language's features (specifically, non-termination and laziness) end up blurring the distinction, so in Haskell, all data is also codata and vice versa, but it doesn't have to be this way, and there are languages that make the distinction clearer.

Both data and codata do have useful representations in the programming domain, and those representations give rise to natural relationships to recursion and corecursion.

It's quite hard to explain these formal definitions and representations without quickly getting technical, but roughly speaking, a data type for, say, a list of integers, is a type L together with a constructor function:

makeL :: Either () (Int, L) -> L

that is somehow "universal" in that it can fully represent any such construction. (Here, you want to interpret the LHS type Either () (Int, L) to mean that a list L is either the empty list Left () or a pair Right (h, t) consisting of the head element h :: Int and a tail list t :: L.)

To start with a counterexample, L = Bool is not the data type we're looking for, because even though you could write:

foo :: Either () (Int, Bool) -> Bool
foo (Left ()) = False
foo (Right (h, t)) = True

to "construct" a Bool, this can't fully represent any such construction. For example, the two constructions:

foo (Right (1, foo (Left ()))) = True
foo (Right (2, foo (Left ()))) = True

give the same Bool value, even though they used different integers, so this Bool value is insufficient to fully represent the construction.

In contrast, the type [Int] is an appropriate data type because the (almost trivial) constructor function:

makeL :: Either () (Int, [Int]) -> [Int]
makeL (Left ()) = []
makeL (Right (h, t)) = h : t

fully represents any possible construction, creating a unique value for each one. So, it's somehow the "natural" construction for the type signature Either () (Int, L) -> L.

Similarly, a codata type for a list of integers would be a type L together with a destructor function:

eatL :: L -> Either () (Int, L)

that is somehow "universal" in the sense that it can represent any possible destruction.

Again, starting with a counterexample, a pair (Int, Int) is not the codata type we're looking for. For example, with the destructor:

eatL :: (Int, Int) -> Either () (Int, (Int, Int))
eatL (a, b) = Right (a, (b, a))

we can represent the destruction:

let p0 = (1, 2)
    Right (1, p1) = eatL p0
    Right (2, p2) = eatL p1
    Right (1, p3) = eatL p2
    Right (2, p4) = eatL p3
...continue indefinitely or stop whenever you want...

but we can't represent the destruction:

let p0 = (?, ?)
    Right (1, p1) = eatL p0
    Right (2, p2) = eatL p1
    Right (3, p3) = eatL p2
    Left () = eatL p3

On the other hand, in Haskell, the list type [Int] is an appropriate codata type for a list of integers, because the destructor:

eatL :: [Int] -> Either () (Int, [Int])
eatL (x:xs) = Right (x, xs)
eatL [] = Left ()

can represent any possible destruction (including both finite or infinite destructions, thanks to Haskell's lazy lists).

(As evidence that this isn't all hand-waving and in case you want to relate it back to the formal math, in technical category theory terms, the above is equivalent to saying that the list-like endofunctor:

F(A) = 1 + Int*A   -- RHS equivalent to "Either () (Int,A)"

gives rise to a category whose objects are constructor functions (AKA F-algebras) 1 + Int*A -> A. A data type associated with F is an initial F-algebra in this category. F also gives rise to another category whose objects are destructor functions (AKA F-coalgebras) A -> 1 + Int*A. A codata type associated with F is a final F-coalgebra in this category.)

In intuitive terms, as suggested by @DanielWagner, a data type is a way of representing any construction of a list-like object, while a codata type is a way of representing any destruction of a list-like object. In languages where data and codata are different, there's a fundamental asymmetry -- a terminating program can only construct a finite list, but it can destruct (the first part of) an infinite list, so data must be finite, but codata can be finite or infinite.

This leads to another complication. In Haskell, we can use makeL to construct an infinite list like so:

myInfiniteList = let t = makeL (Right (1, t)) in t

Note that this would not be possible if Haskell didn't allow lazy evaluation of non-terminating programs. Because we can do this, by the formal definition of "data", a Haskell list-of-integer data type must also include infinite lists! That is, Haskell "data" can be infinite.

This probably conflicts with what you might read elsewhere (and even with the intuition that @DanielWagner provided), where "data" is used to refer to finite data structures only. Well, because Haskell is a little weird and because infinite data isn't allowed in other languages where data and codata are distinct, when people talk about "data" and "codata" (even in Haskell) and are interested in drawing a distinction, they may use "data" to refer to finite structures only.

The way recursion and corecursion fit in to this is that the universality properties naturally give us "recursion" to consume data and "corecursion" to produce codata. If L is a list-of-integer data type with constructor function:

makeL :: Either () (Int, L) -> L

then one way of consuming a list L to produce a Result is to define a (non-recursive) function:

makeResult :: Either () (Int, Result) -> Result

Here, makeResult (Left ()) gives the intended result for an empty list, while makeResult (Right (h, t_result)) gives the intended result for a list whose head element is h :: Int and whose tail would give the result t_result :: Result.

By universality (i.e., the fact that makeL is an initial F-algebra), there exists a unique function process :: L -> Result that "implements" makeResult. In practice, it will be implemented recursively:

process :: [Int] -> Result
process [] = makeResult (Left ())
process (h:t) = makeResult (Right (h, process t))

Conversely, if L is a list-of-integer codata type with destructor function:

eatL :: L -> Either () (Int, L)

then one way of producing a list L from a Seed is to define a (non-recursive) function:

unfoldSeed :: Seed -> Either () (Int, Seed)

Here, unfoldSeed should produce a Right (x, nextSeed) for each desired integer, and produce Left () to terminate the list.

By universality (i.e., the fact that eatL is a final F-coalebra), there exists a unique function generate :: Seed -> L that "implements" unfoldSeed. In practice, it will be implemented corecursively:

generate :: Seed -> [Int]
generate s = case unfoldSeed s of
  Left () -> []
  Right (x, s') -> x : generate s'

So, with all that said, here are the answers to your original questions:

  • Technically, iterate f is corecursive because it's the unique codata-producing function Int -> [Int] that implements:

    unfoldSeed :: Seed -> Either () (Int, Seed)
    unfoldSeed x = Right (x, f x)
    

    by means of generate as defined above.

  • In Haskell, corecursion that produces codata of type [a] relies on laziness. However, strict codata representations are possible. For example, the following codata representation works fine in Strict Haskell and can be safely fully evaluated.

    data CoList = End | CoList Int (() -> CoList)
    

    The following corecursive function produces a CoList value (and I made it finite just for fun -- it's easy to produce infinite codata values, too):

    countDown :: Int -> CoList
    countDown n | n > 0 = CoList n (\() -> countDown (n-1))
                | otherwise = End
    
  • So, no, codata isn't just the result of values in WHNF with form (a, thunk) or similar and corecursion is not synonymous with value recursion. However, WHNF and thunks provide one possible implementation and are the implementation-level reason that a "standard" Haskell list data type is also a codata type.