I have implemented the usual combination of lexer/parser/pretty-printer for reading-in/printing a type in my code. I find there is redundancy among the lexer and the pretty-printer when it comes to plain-string regular expressions, usually employed for symbols, punctuation or separators.
For example I now have
rule token = parse
| "|-" { TURNSTILE }
in my lexer.mll file, and a function like:
let pp fmt (l,r) =
Format.fprintf fmt "@[%a |-@ %a@]" Form.pp l Form.pp r
for pretty-printing. If I decide to change the string for TURNSTILE, I have to edit two places in the code, which I find less than ideal.
Apparently, the OCaml lexer supports a certain ability to define regular expressions and then refer to them within the mll file. So lexer.mll could be written as
let symb_turnstile = "|-"
rule token = parse
| symb_turnstile { TURNSTILE }
But this will not let me externally access symb_turnstile, say from my pretty-printing functions. In fact, after running ocamllex, there are no occurences of symb_turnstile in lexer.ml. I cannot even refer to these identifiers in the OCaml epilogue of lexer.mll.
Is there any way of achieving this?
In the end, I went for the following style which I stole from the sources of
ocamllexitself (so I am guessing it's standard practice). A map from strings to tokens (here an association list) is defined in the preamble oflexer.mllwhere
Symbis a module definingturnstileas a string. Then, the lexing part oflexer.mllis purposely overly general:where
punctuationis a regular expression matching a sequence of symbols.The pretty-printer can now be written like this.