Space leak with redundant use of seq in GHC interpreter

485 views Asked by At

I type this code into the interpreter and memory is rapidly consumed:

last [1..10^7] `seq` ()

I can't see why this needs more than O(1) space. If i do just (which should be the same, because Show forces weak head normal form, so seq is redundant?):

last [1..10^7]

...it works fine.

I'm unable to reproduce this situation outside the interpreter.

What's going on here?


Here are some test cases: http://hpaste.org/69234

Things to note:

  • By running in the interpreter, I load wtf.hs without compiling it, and type wtf<n> in ghci.
  • By compiling, I do ghc --make wtf.hs && ./wtf.
  • last can be substituted for a sum with strict accumulator or a function that finds the max element in the list, and the space leak still happens
  • I haven't seen this behaviour when using $! instead of seq.
  • I tried adding a dummy () parameter because I thought maybe this is a CAF problem. Changes nothing.
  • It's probably not a problem with functions on Enum, because I can reproduce the behaviour with wtf5 and later, which don't use Enum at all.
  • It's probably not a problem with Num, Int, or Integer, because I can reproduce the behaviour without them in wtf14 and wtf16.

I tried reproducing the problem with Peano arithmetic to take lists and integers out of the equation (fetching the at the end of of 10^9), but ran into other sharing/space leak problems I don't understand when trying to implement *.

2

There are 2 answers

2
Don Stewart On

We need to look at the instance of enumFromTo for Integer, and last:

last []                 =  errorEmptyList "last"
last (x:xs)             =  last' x xs
  where last' y []     = y
        last' _ (y:ys) = last' y ys

It is defined in GHC.Enum as:

enumFrom x             = enumDeltaInteger  x 1
enumFromThen x y       = enumDeltaInteger  x (y-x)
enumFromTo x lim       = enumDeltaToInteger x 1 lim

where

enumDeltaInteger :: Integer -> Integer -> [Integer]
enumDeltaInteger x d = x `seq` (x : enumDeltaInteger (x+d) d)
-- strict accumulator, so
--     head (drop 1000000 [1 .. ]
-- works

and

enumDeltaToInteger :: Integer -> Integer -> Integer -> [Integer]
enumDeltaToInteger x delta lim
  | delta >= 0 = up_list x delta lim
  | otherwise  = dn_list x delta lim

up_list :: Integer -> Integer -> Integer -> [Integer]
up_list x0 delta lim = go (x0 :: Integer)
                where
                    go x | x > lim   = []
                         | otherwise = x : go (x+delta)

last is fully lazy, as expected.

For the Integer Enum class, we have a strict accumulator (explicitly) for enumFrom. In the bounded case (e.g. [1..n]), it calls enumDeltaToInteger and then into up_list, which uses a worker to unfold a list until its limit is reached.

But up_list is strict in x in the go helper (see the comparison against lim).

When run in GHCi none of this is optimized, yielding naive calls to enumFromTo, before returning ().

let
  it_ax6 :: ()      
  it_ax6 =
    case last
           @ GHC.Integer.Type.Integer
           (GHC.Enum.enumFromTo
              @ GHC.Integer.Type.Integer
              GHC.Num.$fEnumInteger
              (GHC.Integer.smallInteger 1)
              (GHC.Real.^
                 @ GHC.Integer.Type.Integer
                 @ GHC.Integer.Type.Integer
                 GHC.Num.$fNumInteger
                 GHC.Real.$fIntegralInteger
                 (GHC.Integer.smallInteger 10)
                 (GHC.Integer.smallInteger 7)))
    of _ -> GHC.Unit.()
in
  GHC.Base.thenIO
    @ ()
    @ [()]
    (System.IO.print @ () GHC.Show.$fShow() it_ax6)
    (GHC.Base.returnIO
       @ [()] (GHC.Types.: @ () it_ax6 (GHC.Types.[] @ ())))

So, why are we retaining the list in the seq case, and not in the regular case? The regular case runs nicely in constrant space, relying on the laziness of enumFromTo for Integer and for last. The GHCi core for that case looks like:

let {
  it_aKj :: GHC.Integer.Type.Integer
  [LclId,
   Unf=Unf{Src=<vanilla>, TopLvl=False, Arity=0, Value=False,
           ConLike=False, Cheap=False, Expandable=False,
           Guidance=IF_ARGS [] 170 0}]
  it_aKj =
    GHC.List.last
      @ GHC.Integer.Type.Integer
      (GHC.Enum.enumFromTo
         @ GHC.Integer.Type.Integer
         GHC.Num.$fEnumInteger
         (GHC.Integer.smallInteger 1)
         (GHC.Real.^
            @ GHC.Integer.Type.Integer
            @ GHC.Integer.Type.Integer
            GHC.Num.$fNumInteger
            GHC.Real.$fIntegralInteger
            (GHC.Integer.smallInteger 10)
            (GHC.Integer.smallInteger 7))) } in
GHC.Base.thenIO
  @ ()
  @ [()]
  (System.IO.print
     @ GHC.Integer.Type.Integer GHC.Num.$fShowInteger it_aKj)
  (GHC.Base.returnIO
     @ [()]
     (GHC.Types.:
        @ ()
        (it_aKj
         `cast` (UnsafeCo GHC.Integer.Type.Integer ()
                 :: GHC.Integer.Type.Integer ~ ()))
        (GHC.Types.[] @ ())))

So these are almost identical, with the differences being:

  • in the seq version, last (enumFromTo ..) is forced inside a case.
  • in the regular version, it is a lazy let.
  • in the seq version, the value is computed then discarded, yielding a () -- nothing looks at the result
  • in the regular case, it is inspected and shown.

What is odd is that there's nothing magic about:

let x = case last (enumFromTo 1 n) of _ -> ()

that makes it retain values.

As we see, the implementation of up_list is strict in its accumulator (since it compares against lim, and the list is unfolded lazily -- so last should be able to consume it in constant space). Writing the expression by hand confirms this.

Doing a heap profile of the ghci execution shows the entire list being retained:

enter image description here

which tells us at least that it isn't a chain of thunks, but rather, the entire list is being built strictly and held on to, until being discarded.

So the mystery is: what is holding onto the list argument to last in ghci, and not in ghc?

I suspect some internal (or subtle) detail of ghci now -- I think this is worth a ghci ticket.

3
Chris Kuklewicz On

I think @n.m is right. Nothing is forcing the value in the list, so the 1+1+1+1+... thunk eventually kills space.

I will whip up a quick test.