Space leak in Haskell - old compiler's fault or mine? Apparently the latter

153 views Asked by At

I recently took part in a competitive coding competition.

This Haskell gave a space leak on the judge system running ghc 7.6.3:

t n [] = 0
t n ('8':rest) = t (n+1) rest
t n (')':rest) = n + (t n rest)

main = getLine >>= (\l -> print (t 0 l))

After the competition the test cases were published. One of the failure cases was this: (a file containing 10^5 close parens): https://cses.fi/download/1/b575d19a75bf724b50fa4a399f8187b6d6edb4ccb62bd1a774f9294969152e46

The error was

Stack space overflow: current size 8388608 bytes. Use `+RTS -Ksize -RTS' to increase it.

I also know that the code was compiled with -O2 and -Wall on what I believe to be GHC 7.6.3.

For the life of me I can't reproduce the error with GHC 8.0.2 or 7.10.3 on my machine.

Is there an obvious space leak in the code?

Edit:

I tested the code as suggested below and a fold that was implemented thusly:

import Data.Foldable

t (kasit, score) '8' = (kasit+1, score)
t (kasit, score) _ = (kasit, score+kasit)

main = getLine >>= (\l -> print (snd (foldl' (t) (0, 0) l )))

Neither the bang pattern nor the reimplementation with the strict foldl' solved the problem, though the bangy one got more test cases through. The original still WOMM. Admittedly this is stepping outside of the scope of a generally useful stack exchange question and beginning to look like good old homework. This is also pretty undebugable without more knowledge of the judge system.

2

There are 2 answers

3
jberryman On BEST ANSWER

Yes, the n parameter exhibits an "obvious" (to some) space leak: because it doesn't need to be inspected (e.g. you don't have a case t 0 ... = ...) you can build up thunks of additions in your recursive calls.

Better would be something like:

t _ [] = 0
t !n ('8':rest) = t (n+1) rest
t !n (')':rest) = n + (t n rest)

best would be probably to define this in terms of foldl'.

It's totally possible that a newer version of GHC than 7.6 would do a better job analyzing strictness and optimizing this code.

Useful factoid, forcing stack overflows can be an effective way to find space leaks (which usual manifest as heap usage): http://neilmitchell.blogspot.com/2015/09/detecting-space-leaks.html

9
dfeuer On

I think your last case is causing you trouble. You wrote

t n [] = 0
t n ('8':rest) = t (n+1) rest
t n (')':rest) = n + (t n rest)

Even if we strictify this as jberryman suggested,

t !n [] = 0
t !n ('8':rest) = t (n+1) rest
t !n (')':rest) = n + (t n rest)

the third case is not tail recursive. How can we fix this? Simply add another accumulator representing the amount to add on at the end.

t n0 xs = t' 0 n0 xs
  where
    t' !acc !_n [] = acc
    t' acc n ('8':rest) = t' acc (n + 1) rest
    t' acc n (')':rest) = t' (acc + n) n rest