I am trying to write a simple lexer that will print all words in its input, where a word is a maximal sequence of letters a-zA-Z
. All other characters must be ignored.
My Alex program for this which uses the basic-bytestring
wrapper uses as much memory as the size of the input. I would have expected it to run in constant memory.
The heap profile using -hc
just shows a single block of pinned memory rapidly rising to the size of the input and then slowly decreasing to 0.
Interestingly when using the basic
wrapper and ordinary strings only constant memory is used.
The Alex input file is
{
module Main where
import Data.ByteString.Lazy as B
}
%wrapper "basic-bytestring"
$letters = [a-zA-Z]
$nonletters = [~$letters\n]
tokens :-
$nonletters+ ;
$letters+ {B.copy}
{
main = do
buf <- B.getContents
let toks = alexScanTokens buf
mapM_ B.putStrLn toks
}
When run with an input of size 10M the output of +RTS -s
is
2,924,029,784 bytes allocated in the heap
7,869,696 bytes copied during GC
9,958,560 bytes maximum residency (5 sample(s))
1,423,704 bytes maximum slop
22 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 5634 colls, 0 par 0.06s 0.05s 0.0000s 0.0002s
Gen 1 5 colls, 0 par 0.00s 0.00s 0.0004s 0.0011s
INIT time 0.00s ( 0.00s elapsed)
MUT time 2.79s ( 2.81s elapsed)
GC time 0.06s ( 0.06s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 2.85s ( 2.86s elapsed)
%GC time 2.0% (1.9% elapsed)
Alloc rate 1,047,072,808 bytes per MUT second
Productivity 98.0% of total user, 97.6% of total elapsed
I would appreciate any help on where I am going wrong or how I can further investigate this problem.