I have implemented the usual combination of lexer/parser/pretty-printer for reading-in/printing a type in my code. I find there is redundancy among the lexer and the pretty-printer when it comes to plain-string regular expressions, usually employed for symbols, punctuation or separators.
For example I now have
rule token = parse
| "|-" { TURNSTILE }
in my lexer.mll
file, and a function like:
let pp fmt (l,r) =
Format.fprintf fmt "@[%a |-@ %a@]" Form.pp l Form.pp r
for pretty-printing. If I decide to change the string for TURNSTILE, I have to edit two places in the code, which I find less than ideal.
Apparently, the OCaml lexer supports a certain ability to define regular expressions and then refer to them within the mll
file. So lexer.mll
could be written as
let symb_turnstile = "|-"
rule token = parse
| symb_turnstile { TURNSTILE }
But this will not let me externally access symb_turnstile
, say from my pretty-printing functions. In fact, after running ocamllex
, there are no occurences of symb_turnstile
in lexer.ml
. I cannot even refer to these identifiers in the OCaml epilogue of lexer.mll
.
Is there any way of achieving this?
In the end, I went for the following style which I stole from the sources of
ocamllex
itself (so I am guessing it's standard practice). A map from strings to tokens (here an association list) is defined in the preamble oflexer.mll
where
Symb
is a module definingturnstile
as a string. Then, the lexing part oflexer.mll
is purposely overly general:where
punctuation
is a regular expression matching a sequence of symbols.The pretty-printer can now be written like this.