Haskell Alex: basic-bytestring lexer leaks memory

177 views Asked by At

I am trying to write a simple lexer that will print all words in its input, where a word is a maximal sequence of letters a-zA-Z. All other characters must be ignored.

My Alex program for this which uses the basic-bytestring wrapper uses as much memory as the size of the input. I would have expected it to run in constant memory.

The heap profile using -hc just shows a single block of pinned memory rapidly rising to the size of the input and then slowly decreasing to 0.

Interestingly when using the basic wrapper and ordinary strings only constant memory is used.

The Alex input file is

{
module Main where
import Data.ByteString.Lazy as B
}

%wrapper "basic-bytestring"

$letters = [a-zA-Z]
$nonletters = [~$letters\n]

tokens :-
  $nonletters+  ;
  $letters+     {B.copy}

{
main = do
  buf <- B.getContents
  let toks = alexScanTokens buf
  mapM_ B.putStrLn toks
}

When run with an input of size 10M the output of +RTS -s is

   2,924,029,784 bytes allocated in the heap
       7,869,696 bytes copied during GC
       9,958,560 bytes maximum residency (5 sample(s))
       1,423,704 bytes maximum slop
              22 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      5634 colls,     0 par    0.06s    0.05s     0.0000s    0.0002s
  Gen  1         5 colls,     0 par    0.00s    0.00s     0.0004s    0.0011s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    2.79s  (  2.81s elapsed)
  GC      time    0.06s  (  0.06s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    2.85s  (  2.86s elapsed)

  %GC     time       2.0%  (1.9% elapsed)

  Alloc rate    1,047,072,808 bytes per MUT second

  Productivity  98.0% of total user, 97.6% of total elapsed

I would appreciate any help on where I am going wrong or how I can further investigate this problem.

0

There are 0 answers