Is it possible to memoize a recursion scheme? If so, how would you?
For example, the following uses anamophism and catamorphism
newtype Fix f = In (f (Fix f))
deriving instance (Eq (f (Fix f))) => Eq (Fix f)
deriving instance (Ord (f (Fix f))) => Ord (Fix f)
deriving instance (Show (f (Fix f))) => Show (Fix f)
out :: Fix f -> f (Fix f)
out (In f) = f
-- Catamorphism
type Algebra f a = f a -> a
cata :: (Functor f) => Algebra f a -> Fix f -> a
cata f = f . fmap (cata f) . out
-- Anamorphism
type Coalgebra f a = a -> f a
ana :: (Functor f) => Coalgebra f a -> a -> Fix f
ana f = In . fmap (ana f) . f
to solve the lattice paths problem:
latticePaths m n = cata countPathsAlgNoMemo (ana buildLattice (m, n))
-- recursive solution without dynamic programming
buildLattice :: (Int, Int) -> LeafBTreeF Int (Int, Int)
buildLattice (m, n)
| m == 0 && n == 0 = LeafBTreeLeafF 1
| m < 0 || n < 0 = LeafBTreeLeafF 0
| otherwise = LeafBTreeNodeF (m - 1, n) (m, n - 1)
countPathsAlgNoMemo :: LeafBTreeF Int Int -> Int
countPathsAlgNoMemo (LeafBTreeLeafF n) = n
countPathsAlgNoMemo (LeafBTreeNodeF a b) = a + b
It is inefficient because subproblems are recomputed instead of stored and reused. I would like to know if there is a way to store (or get the haskell compiler to store) previously computed subproblems.
I've had a look at some resources related to memoizing polymorphic functions (http://blog.sigfpe.com/2009/11/memoizing-polymorphic-functions-with.html, http://conal.net/blog/posts/memoizing-polymorphic-functions-part-two) but haven't been able understand how they might apply here.
NOTE: I'm specifically interested in whether apomorphism/paramorphism
and anamorphism/catamorphism
can be memoized (or if there is any other solution for storing subproblems using these recursion schemes). I understand that histomorphism and dynamorphism are suited to solve dynamic programming problems but for my purposes I want to limit my focus to apo/para or ana/cata.
My paramorphism
and apomorphism
:
-- Paramorphism
type RAlgebra f a = f (Fix f, a) -> a
para :: (Functor f) => RAlgebra f a -> Fix f -> a
para rAlg = rAlg . fmap fanout . out
where fanout t = (t, para rAlg t)
-- Apomorphism
type RCoalgebra f a = a -> f (Either (Fix f) a)
apo :: Functor f => RCoalgebra f a -> a -> Fix f
apo rCoalg = In . fmap fanin . rCoalg
where fanin = either id (apo rCoalg)
Update: See below on paramorphism/apomorphism compositions.
Note that memoizing
cata f
andana g
separately is pointless. The problem is that building a fresh lattice from scratch:is basically no more difficult than reading that structure from a pre-generated copy in memory. It would be like memoizing
replicate 1000000000 'x'
. It doesn't make any sense.Same with
cata countPathsAlg
. Calculating the number of paths is easy. The hard part is traversing the structure (whether to perform the calculation or to look it up as a key in a memo table). If you want to memoize a catamorphism efficiently, you need to represent large structures by simple keys. But we already have simple keys for our structures -- they're the values we pass to the anamorphism to generate those structures!In other words, to make this computation amenable to memoization, you need to eliminate the intermediate data structure by fusing the catamorphism and anamorphism.
This isn't too tough. Suppose we have:
Then, it follows that:
Note that this is a
refold
AKAhylo
from therecursion-schemes
package:For your example, we have the fused recursion:
which is easily memoized with, say, the
memoize
package:More generally, any hylomorphism can be memoized with:
Full memoization of a paramorphism/apomorphism combination isn't possible. There are two problems.
First, when an apo produces a
Left
, it can produce an arbitrarily complex structure. Even if you could memoize the root of this structure (as it's keyed by ana
value), there's no efficient way to memoize identical subproblems within that structure. For example, suppose an apo produces aLeft
rose tree consisting of 1000 identical branches each of a million nodes. The only way to discover that all 1000 branches are identical and can share the same solution is to traverse the million nodes in each branch to compare them.Second, a paramorphism calculation can, in general, depend on the complete structure at the current node, not just some "small" combination of previously calculated paramorphism results from local structure defined by the algebra. For example, consider a paramorphism on a rose tree that sums up the number of times a node's value appears in its subtree, and adds those counts together across all nodes, and suppose it's implemented as follows:
Note that once we have the result for a "subproblem", all we have is the total reference count in that tree. If this same tree appears under five different parent nodes, we can reuse this result, but we still need to search the whole tree for the five different parent node values to construct their counts.
This doesn't mean that memoization is useless. We can memoize the
Right
parts of the apomorphism, and memoization of the paramorphism results may still be useful even if full structure traversals are needed (e.g., forrefCount
above, partial memoization reduces the complexity from cubic to quadratic, I think).So, if you continue your expansion:
we can achieve the partial memoization:
which may be useful. The unmemoized parts will be the
para f t
when the apo produces aLeft
and the potential costly operation of the para on its first argument produced byapo g a
in theRight
branch.A catamorphism/apomorphism combination is also partially memoizable. We still have the problem that the apo can produce
Left
s with arbitrary, unmemoizable structure, but theRight
s of the apo can be fully memoized.