Should I expect Data.Vector to fuse the following function?

115 views Asked by At

I have recently started using Data.Vector. My understanding is that it should be able to take chains of vector operations and efficiently combine them via fusion.

In particular, I would expect these two functions to have similar performance characteristics (i.e. searchTopDown to fuse into something like searchTd).

import qualified Data.Vector.Unboxed as VU
import Data.Vector.Unboxed (fromList, (!))

--| Going down from i searches the vector for the element satisfying predicate 
searchTopDown :: VU.Vector Int -> Int -> (Int -> Bool) -> (Int, Int)
searchTopDown vec i pred =
    VU.head $ VU.dropWhile (not . pred . snd)
    $ VU.reverse $ VU.take (1 + i)
    $ VU.indexed vec

searchTd :: VU.Vector Int -> Int -> (Int -> Bool) -> (Int, Int)
searchTd vec i pred = let val = vec ! i
                    in
                      if pred val then (i,  val) else searchTd vec (i-1) pred

However, benchmarking this code shows performance difference in orders of magnitude.

testTd/150
                 lower bound    estimate    upper bound
OLS regression      4.51 μs     4.53 μs     4.56 μs
R² goodness-of-fit  1.000       1.000       1.000
Mean execution time 4.52 μs     4.53 μs     4.55 μs
Standard deviation  35.2 ns     50.0 ns     69.8 ns

topDown/150
                 lower  bound    estimate    upper bound
OLS regression     75.4 μs       76.6 μs     77.9 μs
R² goodness-of-fit  0.997        0.998       0.999
Mean execution time 74.6 μs      75.2 μs     76.1 μs
Standard deviation  1.58 μs      2.42 μs     3.75 μs

Questions: What prevents searchTopDown from being fused? What can I do to help it fuse? When should I expect fusion in general?

Code used for benchmarks:

import qualified Data.List as L
import Criterion.Main

vec :: VU.Vector Int
vec = fromList $ L.replicate 100 (-1) L.++ L.replicate 100 1

testTd :: Int -> Int
testTd i = fst $ searchTd vec i (<0)

testTopDown :: Int -> Int
testTopDown i = fst $ searchTopDown vec i (<0)


benchMain = defaultMain [
     bgroup "testTd"    [ bench "1"  $ whnf testTd 1
                        , bench "90"  $ whnf testTd 90
                        , bench "100"  $ whnf testTd 100
                        , bench "150" $ whnf testTd 150
      ],
      bgroup "topDown"  [ bench "1"  $ whnf testTopDown 1
                        , bench "90"  $ whnf testTopDown 90
                        , bench "100"  $ whnf testTopDown 100
                        , bench "150" $ whnf testTopDown 150
      ]
  ]

EDIT Following @Zeta suggestions (adding -O2) I get excellent performance from both functions. However, I experience this problem in a larger project so I changed benchmarking code to resemble it. searchTopDown seems to be linear in i for this case. New benchmarking code and results (in microsecs):

import Data.Vector.Unboxed (fromList, (!))
import qualified Data.Vector.Unboxed as VU
import qualified Data.Vector.Unboxed.Mutable as VM
import Control.Monad.ST
import qualified Data.List as L

datStream :: [(Int, Int)]
datStream = L.zip [0..] $ L.replicate 100 (-1) L.++ L.replicate 100 1

vecUpdate v (iChg, vChg) = runST $ do
    mv <- VU.unsafeThaw v
    VM.modify mv (+ vChg) iChg
    VU.unsafeFreeze mv

vecs :: [VU.Vector Int]
vecs = let vecInit = VU.replicate 200 0 in L.drop 1 $ L.scanl' vecUpdate vecInit datStream

test f i = L.sum $ L.map (\x -> fst $ f x i (<0)) vecs
testTd = test searchTd
testTopDown = test searchTopDown

enter image description here

0

There are 0 answers