How to make FsLex rules tail-recursive / prevent StackOverflowExceptions

151 views Asked by At

I'm implementing a scripting language with fslex / fsyacc and have a problem with large user inputs (i.e. >1k scripts), specifically this error occurs when the lexer analyzes a very large string.

I scan strings in a custom lexer function (to allow users to escape characters like quotes with a backslash). Here's the function:

and myText pos builder = parse 
| '\\' escapable                 { let char = lexbuf.LexemeChar(1)
                                   builder.Append (escapeSpecialTextChar char) |> ignore
                                   myText pos builder lexbuf 
| newline                        { newline lexbuf; 
                                   builder.Append Environment.NewLine |> ignore 
                                   myText pos builder lexbuf 
                                 }
| (whitespace | letter | digit)+ { builder.Append (lexeme lexbuf) |> ignore
                                   myText pos builder lexbuf     
                                 } // scan as many regular characters as possible
| '"'                            { TEXT (builder.ToString())   } // finished reading myText
| eof                            { raise (LexerError (sprintf "The text started at line %d, column %d has not been closed with \"." pos.pos_lnum (pos.pos_cnum - pos.pos_bol))) }
| _                              { builder.Append (lexbuf.LexemeChar 0) |> ignore
                                   myText pos builder lexbuf 
                                 } // read all other characters individually

The function just interprets the various characters and then recursively calls itself – or returns the read string when it reads a closing quote. When the input is too large, a StackOverflowException will be thrown either in the _ or (whitespace | ... rule.

The generated Lexer.fs contains this:

(* Rule myText *)
and myText pos builder (lexbuf : Microsoft.FSharp.Text.Lexing.LexBuffer<_>) = _fslex_myText pos builder 21 lexbuf
// [...] 

and _fslex_myText pos builder _fslex_state lexbuf =
  match _fslex_tables.Interpret(_fslex_state,lexbuf) with
  | 0 -> ( 
# 105 "Lexer.fsl"
                                                   let char = lexbuf.LexemeChar(1) in
                                                   builder.Append (escapeSpecialTextChar char) |> ignore
                                                   myText pos builder lexbuf 
// [...]

So the actual rule becomes _fslex_myText and myText calls it with some internal _fslex_state.

I assume this is the problem: myText is not tail-recursive because it is transformed into 2 functions that call each other, which at some point results in a StackOverflowException.

So my question really is:

1) is my assumption correct? Or can the F# compiler optimize these 2 functions like it does with tail-recursive ones and generate a while-loop instead of recursive calls? (Functional programming is still new to me)

2) What can I do to make the function tail-recursive / prevent the StackOverflowException? FsLex documentation is not that extensive and I couldn't figure out what it is the official F# compiler does differently – obviously it has no problems with large strings, but its string rule also calls itself recursively, so I'm at a loss here :/

1

There are 1 answers

1
Jack P. On BEST ANSWER

As written, your myText function appears to be OK -- the F# compiler should be able to make that tail-recursive.

One thing to check -- are you compiling/running this in Debug or Release configuration? By default, the F# compiler only does tail-call optimization in the Release configuration. If you want tail-calls while you're debugging, you need to explicitly enable that in the project's properties (on the Build tab).