I have recently started using Data.Vector. My understanding is that it should be able to take chains of vector operations and efficiently combine them via fusion.
In particular, I would expect these two functions to have similar performance characteristics (i.e. searchTopDown to fuse into something like searchTd).
import qualified Data.Vector.Unboxed as VU
import Data.Vector.Unboxed (fromList, (!))
--| Going down from i searches the vector for the element satisfying predicate
searchTopDown :: VU.Vector Int -> Int -> (Int -> Bool) -> (Int, Int)
searchTopDown vec i pred =
VU.head $ VU.dropWhile (not . pred . snd)
$ VU.reverse $ VU.take (1 + i)
$ VU.indexed vec
searchTd :: VU.Vector Int -> Int -> (Int -> Bool) -> (Int, Int)
searchTd vec i pred = let val = vec ! i
in
if pred val then (i, val) else searchTd vec (i-1) pred
However, benchmarking this code shows performance difference in orders of magnitude.
testTd/150
lower bound estimate upper bound
OLS regression 4.51 μs 4.53 μs 4.56 μs
R² goodness-of-fit 1.000 1.000 1.000
Mean execution time 4.52 μs 4.53 μs 4.55 μs
Standard deviation 35.2 ns 50.0 ns 69.8 ns
topDown/150
lower bound estimate upper bound
OLS regression 75.4 μs 76.6 μs 77.9 μs
R² goodness-of-fit 0.997 0.998 0.999
Mean execution time 74.6 μs 75.2 μs 76.1 μs
Standard deviation 1.58 μs 2.42 μs 3.75 μs
Questions: What prevents searchTopDown from being fused? What can I do to help it fuse? When should I expect fusion in general?
Code used for benchmarks:
import qualified Data.List as L
import Criterion.Main
vec :: VU.Vector Int
vec = fromList $ L.replicate 100 (-1) L.++ L.replicate 100 1
testTd :: Int -> Int
testTd i = fst $ searchTd vec i (<0)
testTopDown :: Int -> Int
testTopDown i = fst $ searchTopDown vec i (<0)
benchMain = defaultMain [
bgroup "testTd" [ bench "1" $ whnf testTd 1
, bench "90" $ whnf testTd 90
, bench "100" $ whnf testTd 100
, bench "150" $ whnf testTd 150
],
bgroup "topDown" [ bench "1" $ whnf testTopDown 1
, bench "90" $ whnf testTopDown 90
, bench "100" $ whnf testTopDown 100
, bench "150" $ whnf testTopDown 150
]
]
EDIT
Following @Zeta suggestions (adding -O2
) I get excellent performance from both functions. However, I experience this problem in a larger project so I changed benchmarking code to resemble it. searchTopDown
seems to be linear in i
for this case. New benchmarking code and results (in microsecs):
import Data.Vector.Unboxed (fromList, (!))
import qualified Data.Vector.Unboxed as VU
import qualified Data.Vector.Unboxed.Mutable as VM
import Control.Monad.ST
import qualified Data.List as L
datStream :: [(Int, Int)]
datStream = L.zip [0..] $ L.replicate 100 (-1) L.++ L.replicate 100 1
vecUpdate v (iChg, vChg) = runST $ do
mv <- VU.unsafeThaw v
VM.modify mv (+ vChg) iChg
VU.unsafeFreeze mv
vecs :: [VU.Vector Int]
vecs = let vecInit = VU.replicate 200 0 in L.drop 1 $ L.scanl' vecUpdate vecInit datStream
test f i = L.sum $ L.map (\x -> fst $ f x i (<0)) vecs
testTd = test searchTd
testTopDown = test searchTopDown