I'm trying to follow Appel's "Modern Compiler Implementation in ML" and am writing the lexer using Ocamllex.
The specification asks for the lexer to return strings after translating escape sequences. The following code is an excerpt from the ocamllex input file:
rule tiger = parse
...
| '"'
{ let buffer = Buffer.create 1 in
STRING (stringl buffer lexbuf)
}
and stringl buffer = parse
| '"' { Buffer.contents buffer }
| "\\t" { Buffer.add_char buffer '\t'; stringl buffer lexbuf }
| "\\n" { Buffer.add_char buffer '\n'; stringl buffer lexbuf }
| "\\n" { Buffer.add_char buffer '\n'; stringl buffer lexbuf }
| '\\' '"' { Buffer.add_char buffer '"'; stringl buffer lexbuf }
| '\\' '\\' { Buffer.add_char buffer '\\'; stringl buffer lexbuf }
| eof { raise End_of_file }
| _ as char { Buffer.add_char buffer char; stringl buffer lexbuf }
Is there a better way?
You may be interested in looking at how the Ocaml lexer does this (search for
and string
). In essence, it's the same method as yours, without the nice local buffer (I find your code nicer on this point, but this is a bit less efficient), a bit more complex because more escaping is supported, and using an escape table (char_for_backslash) to factorize similar rules.Also, you have the rule
"\\n"
repeated twice, and I think1
is a very pessimistic estimate of your string length, I would rather use20
here (to avoid needless resizing).