Intersperse values into separate Vectors using generate

196 views Asked by At

I am trying to generate a tuple of Vectors by using a function that creates a custom data type (or a tuple) of values from an index. Here is an approach that achieves the desired result:

import Prelude hiding (map, unzip)
import Data.Vector hiding (map)
import Data.Array.Repa
import Data.Functor.Identity

data Foo = Foo {fooX :: Int, fooY :: Int}

unfoo :: Foo -> (Int, Int)
unfoo (Foo x y) = (x, y)

make :: Int -> (Int -> Foo) -> (Vector Int, Vector Int)
make n f = unzip $ generate n getElt where
  getElt i = unfoo $ f i

Except that I would like to do it in a single iteration per Vector, almost like it is shown below, but avoiding multiple evaluation of function f:

make' :: Int -> (Int -> Foo) -> (Vector Int, Vector Int)
make' n f = (generate n getElt1, generate n getElt2) where
  getElt1 i = fooX $ f i
  getElt2 i = fooY $ f i

Just as a note, I understand that Vector library supports fusion, and the first example is already pretty efficient. I need a solution to generate concept, other libraries have very similar constructors (Repa has fromFunction for example), and I am using Vectors here simply to demonstrate a problem.

Maybe some sort of memoizing of f function call would work, but I cannot think of anything.

Edit:

Another demonstration of the problem using Repa:

makeR :: Int -> (Int -> Foo) -> (Array U DIM1 Int, Array U DIM1 Int)
makeR n f = runIdentity $ do
  let arr = fromFunction (Z :. n) (\ (Z :. i) -> unfoo $ f i)
  arr1 <- computeP $ map fst arr
  arr2 <- computeP $ map snd arr
  return (arr1, arr2)

Same as with vectors, fusion saves the day on performance, but an intermediate array arr of tuples is still required, which I am trying to avoid.

Edit 2: (3 years later)

In the Repa example above it will not create an intermediate array, since fromFunction creates a delayed array. Instead it will be even worse, it will evaluate f twice for each index, one for the first array, second time for the second array. Delayed array must be computed in order to avoid such duplication of work.

2

There are 2 answers

0
lehins On BEST ANSWER

Looking back at my own question from a few years ago I can now easily show what I was trying to do back than and how to get it done.

In short, it can't be done purely, therefore we need to resort to ST monad and manual mutation of two vectors, but in the end we do get this nice and pure function that creates only two vectors and does not rely on fusion.

import Control.Monad.ST
import Data.Vector.Primitive
import Data.Vector.Primitive.Mutable

data Foo = Foo {fooX :: Int, fooY :: Int}

make :: Int -> (Int -> Foo) -> (Vector Int, Vector Int)
make n f = runST $ do
  let n' = max 0 n
  mv1 <- new n'
  mv2 <- new n'
  let fillVectors i
        | i < n' = let Foo x y = f i
                   in write mv1 i x >> write mv2 i y >> fillVectors (i + 1)
        | otherwise = return ()
  fillVectors 0
  v1 <- unsafeFreeze mv1
  v2 <- unsafeFreeze mv2
  return (v1, v2)

And the we use it in a similar fashion it is done with generate:

λ> make 10 (\ i -> Foo (i + i) (i * i))
([0,2,4,6,8,10,12,14,16,18],[0,1,4,9,16,25,36,49,64,81])
1
dfeuer On

The essential thing you're trying to write is

splat f = unzip . fmap f

which shares the results of evaluating f between the two result vectors, but you want to avoid the intermediate vector. Unfortunately, I'm pretty sure you can't have it both ways in any meaningful sense. Consider a vector of length 1 for simplicity. In order for the result vectors to share the result of f (v ! 0), each will need a reference to a thunk representing that result. Well, that thunk has to be somewhere, and it really might as well be in a vector.