I am implementing a fractal image compression algorithm of binary images in Haskell. For this purpose i have to find to a given range block (a sub-image) the closest image in a so called domain pool, a list of lists of images. I am comparing images by calculating the sum square difference of both their pixel values.
I use the Haskell Image Processing (HIP) library for reading and writing images.
compress :: Image VS X Bit -> Int -> [(Int, Int)]
compress img blockSize = zip dIndices tIndices
where rImg = img
dImg = downsample2 rImg
rBlocks = (toBlocks rImg blockSize) :: [Image VS X Bit]
dBlocks = (toBlocks dImg blockSize) :: [Image VS X Bit]
dPool = (createDPool dBlocks) :: [[Image VS X Bit]]
distanceLists = map (\x -> (map.map) (distance x) dPool) rBlocks
dIndices = map (fst . getMinIndices) distanceLists
tIndices = map (snd . getMinIndices) distanceLists
distance :: Image VS X Bit -> Image VS X Bit-> Int
distance x y = sumSquareDifference (toBinList x) (toBinList y)
where toBinList = map (toNum . extractBitOfPixel) . concat . toLists
toLists :: MArray arr cs e => Image arr cs e -> [[Pixel cs e]]
toLists img = [[index img (i, j) | j <- [0..cols img -1]] | i <- [0.. rows img -1]]
extractBitOfPixel :: Pixel X Bit -> Bit
extractBitOfPixel (PixelX b) = b
sumSquareDifference :: [Int] -> [Int] -> Int
sumSquareDifference a b = sum $ zipWith (\x y -> (x-y)^2) a b
The performance of this code is really bad. Compressing a 256x256 image with a block size of 2 takes around 5 minutes despite compiling with -O2
. Profiling shows me that most of the runtime is spent in the function distance
, especially in sumSquareDifference
, but also in toLists
and toBinList
:
binaryCompressionSimple +RTS -p -RTS
total time = 1430.89 secs (1430893 ticks @ 1000 us, 1 processor)
total alloc = 609,573,757,744 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
sumSquareDifference Main binaryCompressionSimple.hs:87:1-63 30.9 28.3
toLists Main binaryCompressionSimple.hs:66:1-90 20.3 47.0
distance.toBinList Main binaryCompressionSimple.hs:74:11-79 10.9 15.1
main Main binaryCompressionSimple.hs:(14,1)-(24,21) 7.3 0.0
compress Main binaryCompressionSimple.hs:(28,1)-(36,60) 6.9 0.0
distance Main binaryCompressionSimple.hs:(71,1)-(74,79) 5.7 0.9
compress.distanceLists.\ Main binaryCompressionSimple.hs:34:38-65 5.2 4.4
compress.distanceLists Main binaryCompressionSimple.hs:34:11-74 2.8 0.0
main.\ Main binaryCompressionSimple.hs:20:72-128 2.7 0.0
getMinIndices.getMinIndex Main binaryCompressionSimple.hs:116:11-53 2.7 1.8
sumSquareDifference.\ Main binaryCompressionSimple.hs:87:52-58 2.7 2.5
Is there a way to improve performance?
A block size of 2 means comparing 16384 range blocks each with 131072 images of the domain pool, so sumSquareDifference will be called (16384*131072=)2147483648 times and calculate each time the sum square difference of two lists with length=4. I realize this is a large number but shouldn't the code be faster anyway (lazy evaluating of lists)? Is this a Haskell problem or an algorithm problem?
Edit:
I was able to at least improve the performance by a third by using:
distance :: Image VS X Bit -> Image VS X Bit-> Int
distance x y
| x == y = 0
| otherwise = sumSquareDifference (toBinList x) (toBinList y)
where toBinList = map (toNum . extractBitOfPixel) . concat . inlinedToLists
Edit 2:
I was able to increase the performance enormously by creating dPool
with the function genDistanceList
, which stops the calculation as soon as two identical images are found:
genDistanceList :: [[Image VS X Bit]] -> Image VS X Bit -> [[Int]]
genDistanceList dPool rBlock = nestedTakeWhileInclusive (/= 0) $ (map.map) (distance rBlock) dPool
The absolute first thing to try is skipping the conversion to lists:
Since you haven't provided us with a minimal reproducible example, it's impossible to tell what effect, if any, that would have. If you want better advice, provide better data.
EDIT
Looking through the haddocks for hip, I suspect the following will be even better still:
Here, the
fold
andzipWith
are the ones provided byhip
, notbase
.