To be more specific, I have the following innocuous-looking little Repa 3 program:
{-# LANGUAGE QuasiQuotes #-}
import Prelude hiding (map, zipWith)
import System.Environment (getArgs)
import Data.Word (Word8)
import Data.Array.Repa
import Data.Array.Repa.IO.DevIL
import Data.Array.Repa.Stencil
import Data.Array.Repa.Stencil.Dim2
main = do
[s] <- getArgs
img <- runIL $ readImage s
let out = output x where RGB x = img
runIL . writeImage "out.bmp" . Grey =<< computeP out
output img = map cast . blur . blur $ blur grey
where
grey = traverse img to2D luminance
cast n = floor n :: Word8
to2D (Z:.i:.j:._) = Z:.i:.j
---------------------------------------------------------------
luminance f (Z:.i:.j) = 0.21*r + 0.71*g + 0.07*b :: Float
where
(r,g,b) = rgb (fromIntegral . f) i j
blur = map (/ 9) . convolve kernel
where
kernel = [stencil2| 1 1 1
1 1 1
1 1 1 |]
convolve = mapStencil2 BoundClamp
rgb f i j = (r,g,b)
where
r = f $ Z:.i:.j:.0
g = f $ Z:.i:.j:.1
b = f $ Z:.i:.j:.2
Which takes this much time to process a 640x420 image on my 2Ghz core 2 duo laptop:
real 2m32.572s
user 4m57.324s
sys 0m1.870s
I know something must be quite wrong, because I have gotten much better performance on much more complex algorithms using Repa 2. Under that API, the big improvement I found came from adding a call to 'force' before every array transform (which I understand to mean every call to map, convolve, traverse etc). I cannot quite make out the analogous thing to do in Repa 3 - in fact I thought the new manifestation type parameters are supposed to ensure there is no ambiguity about when an array needs to be forced? And how does the new monadic interface fit into this scheme? I have read the nice tutorial by Don S, but there are some key gaps between the Repa 2 and 3 APIs that are little discussed online AFAIK.
More simply, is there a minimally impactful way to fix the above program's efficiency?
The new representation type parameters don't automagically force when needed (it's probably a hard problem to do that well) - you still have to force manually. In Repa 3 this is done with the computeP function:
I personally really don't understand why it's monadic, because you can just as well use Monad Identity:
So, now your
output
function can be rewritten with appropriate forcing:with an abbreviation
f
using a helper functionu
to aid type inference:With these changes, the speedup is quite dramatic - which is to be expected, as without intermediate forcing each output pixel ends up evaluating much more than is necessary (the intermediate values aren't shared).
Your original code:
With forcing:
Tested with a 600x400 png, the output files were identical.