I've written several compilers and am familiar with lexers, regexs/NFAs/DFAs, parsers and semantic rules in flex/bison, JavaCC, JavaCup, antlr4 and so on.
Is there some sort of magical monadic operator that seamlessly grows/combines a token with a mix of Parser Char (ie Text.Megaparsec.Char) vs. Parser String?
Is there a way / best practices to represent a clean separation of lexing tokens and nonterminal expectations?
Typically, one uses applicative operations to directly combine
Parser CharandParser Strings, rather than "upgrading" the former. For example, a parser for alphanumeric identifiers that must start with a letter would probably look like:If you were doing something more complicated, like parsing dollar amounts with optional cents, for example, you might write:
If you find yourself trying to build a
Parser Stringout of a complicated sequence ofParser CharandParser Stringparsers in a lot of situations, then you could define a few helper operators. If you find the variety of operators annoying, you could just define(<++>)and a short-form forcharToStrlikec :: Parser Char -> Parser String.so you can write something like:
As @leftroundabout says, there's nothing hackish about
fmap (:[]). If you prefer, writefmap (\c -> [c])if you think it looks clearer.