I have the following Haskell program for computing a maximum sum substring of a string of integers:
{-# LANGUAGE BangPatterns #-} {-# OPTIONS_GHC -O2 #-}
import Data.Functor
import Data.Maybe
import Data.ByteString.Lazy.Char8 (getContents,lines,readInt,words)
import Prelude hiding (getContents,words,lines)
main = do
cont <- words <$> getContents
putStrLn $ show $ snd $ foldl opt (0,0) $ map (fst.fromJust.readInt) cont
opt (!c,!m) x = (max 0 (c+x),max m (c+x))
The problem with this program is that it reads the whole file into memory. The corresponding program without BytesString does not have this problem:
{-# LANGUAGE BangPatterns #-} {-# OPTIONS_GHC -O2 #-}
import Data.Functor
import Data.Maybe
main = do
cont <- words <$> getContents
putStrLn $ show $ snd $ foldl opt (0,0) $ map read cont
opt (!c,!m) x = (max 0 (c+x),max m (c+x))
It only uses a small constant amount of memory, but of course it is excruciatingly slow (about 25x slower).
The problem only occurs for programs that read very large lines. If the input is spread over multiple small lines, ByteString performs as expected.
Is there any way around this?
The use of lazy tuples there is sub-optimal. This is better rewritten as:
So you get a strict, unboxed accumulator. However, you're better off writing this whole thing as an incremental left fold. that's why
readInt
returns the remaining input in its 2nd parameter. No need for the sum . map . words pipeline.The version you submitted leaks space. Run on a large file, and it uses heap proportional to the file size (on 640k entries).
So it is retaining the file, as you say.
So what is retaining memory? One clue is the foldl with
opt
. You pass it a lazy tuple. Andfoldl
is lazy in its accumulator.So you're simply building up a long chain of unevaluated
+
operations. The bang patterns onopt
make no difference, sincefoldl
never evaluates its accumulator. Only when you finally inspect the result at the end does the whole thing collapse.This is a classic space leak. So: