Initially I have a ByteString, which i then unpack and convert into Int16s, this part of the process takes relatively little time. I then go to convert the list of Int16s into a Repa array with the following line,
Repa.fromListUnboxed (Z :. bytesOfDataPerImage `div` 2) listOfInts
According to the profiler this line is taking ~40% of CPU time, which could just be indicative that the computations I am performing don't warrant the use of Repa. Is there a more efficient route to take when going from ByteString to Repa array?
I have tried the Repa fromByteString function, though the transformation of
Array B DIM1 Word8 -> Array U DIM1 Int16
was pretty slow. I was performing this by first reshaping the array into a 2d array of Word8s, then folding into Int16s. Perhaps the Byte array was the right approach and my conversion method is just wrong.
convertImageData :: Array B DIM1 Word8 -> Array U DIM1 Int16
convertImageData !arr = Repa.foldS convertWords 0 (Repa.map fromIntegral (splitArray arr))
splitArray :: Array B DIM1 Word8 -> Array U DIM2 Word8
splitArray !arr = computeUnboxedS $ reshape (Z :. ((size $ extent arr) `div` 2) :. 2) arr
convertWords :: Int16 -> Int16 -> Int16
convertWords !word1 !word2 = (word1 `shiftL` 8) .|. word2
For some context this program is being benchmarked against the same program written in C/C++.
Your initial approach of converting into a list and later calling
Repa.fromListUnboxedis certainly very slow, since all you are doing is forcing elements of a list and than loading it sequentially into the unboxed array. That is why conversion into a list takes very little time, since all it does is it creates a bunch of thunks, but the actual computation happens when you load it into the array.Your second approach is definitely way better, but there are still unnecessary steps, eg. there is no need to
reshapethe array, you can just pass the new size to thefromByteStringfunction`. So here is a slightly improved version:fromByteStringfunction andBrepresentation in Repa isn't particularly fast for some reason, so there is a faster way to do it, namely to construct an array by directly indexing theByteString:Switching to sequential computation with
Repa.computeUnboxedSwill give you a factor of x2 slow down, but since we are trying optimize it, we need go all the way with parallel computation.Not to steal all the thunder from a very nice Repa library, I'd like to also show how all that would work with new massiv library:
Just to present some concrete numbers showing the optimizations in action here is a stripped down criterion benchmarks: