I'm implementing a scripting language with fslex / fsyacc and have a problem with large user inputs (i.e. >1k scripts), specifically this error occurs when the lexer analyzes a very large string.
I scan strings in a custom lexer function (to allow users to escape characters like quotes with a backslash). Here's the function:
and myText pos builder = parse
| '\\' escapable { let char = lexbuf.LexemeChar(1)
builder.Append (escapeSpecialTextChar char) |> ignore
myText pos builder lexbuf
| newline { newline lexbuf;
builder.Append Environment.NewLine |> ignore
myText pos builder lexbuf
}
| (whitespace | letter | digit)+ { builder.Append (lexeme lexbuf) |> ignore
myText pos builder lexbuf
} // scan as many regular characters as possible
| '"' { TEXT (builder.ToString()) } // finished reading myText
| eof { raise (LexerError (sprintf "The text started at line %d, column %d has not been closed with \"." pos.pos_lnum (pos.pos_cnum - pos.pos_bol))) }
| _ { builder.Append (lexbuf.LexemeChar 0) |> ignore
myText pos builder lexbuf
} // read all other characters individually
The function just interprets the various characters and then recursively calls itself – or returns the read string when it reads a closing quote. When the input is too large, a StackOverflowException
will be thrown either in the _
or (whitespace | ...
rule.
The generated Lexer.fs
contains this:
(* Rule myText *)
and myText pos builder (lexbuf : Microsoft.FSharp.Text.Lexing.LexBuffer<_>) = _fslex_myText pos builder 21 lexbuf
// [...]
and _fslex_myText pos builder _fslex_state lexbuf =
match _fslex_tables.Interpret(_fslex_state,lexbuf) with
| 0 -> (
# 105 "Lexer.fsl"
let char = lexbuf.LexemeChar(1) in
builder.Append (escapeSpecialTextChar char) |> ignore
myText pos builder lexbuf
// [...]
So the actual rule becomes _fslex_myText
and myText
calls it with some internal _fslex_state
.
I assume this is the problem: myText
is not tail-recursive because it is transformed into 2 functions that call each other, which at some point results in a StackOverflowException
.
So my question really is:
1) is my assumption correct? Or can the F# compiler optimize these 2 functions like it does with tail-recursive ones and generate a while-loop instead of recursive calls? (Functional programming is still new to me)
2) What can I do to make the function tail-recursive / prevent the StackOverflowException
? FsLex documentation is not that extensive and I couldn't figure out what it is the official F# compiler does differently – obviously it has no problems with large strings, but its string rule also calls itself recursively, so I'm at a loss here :/
As written, your
myText
function appears to be OK -- the F# compiler should be able to make that tail-recursive.One thing to check -- are you compiling/running this in
Debug
orRelease
configuration? By default, the F# compiler only does tail-call optimization in theRelease
configuration. If you want tail-calls while you're debugging, you need to explicitly enable that in the project's properties (on theBuild
tab).