I'm using uu-parsinglib
, but I think the following question is parser combinator generic.
Let's consider the following example:
I've got a lexer with a combinator pLex
, which produces a list of tokens (of type MyToken
). I now want to write a parser, which will consume the tokens and build an AST
.
What is the best way to connect the lexer and parser? Right now I have a lex
function:
lex s = parse ( (,) <$> pLex <*> pEnd) (createStr (LineColPos 0 0 0) s)
Should I create a function parse p = ...
? If yes, how do I construct it to keep track of columns and lines from lexer? Or should I create a parserCombinator
, which would use the pLex
combinator somehow?
Table-based parsers require separation of lexical analysis and parsing because of their limited lookahead capability. Looking ahead far enough to combine lexical analysis into the parser would explode the state space.
Combinator-based approaches do not usually suffer this problem, as they are typically doing recursive-descent parsing. Unless otherwise noted by the library author, there is no harm in combining the phases and not much to gain by separating them.
Although uu-parsinglib provides the
Str
class to abstract over different string-like inputs, looking at its definition shows that it still assumes that you are ultimately reading a sequence of Char, whether they be from a String, ByteString, Text, etc. So trying to get it to parse a MyToken stream seems like it could be difficult. Parsec might be a better choice if you feel you need to do that.As to your question about your string implementation, combinators take a string-like input containing syntactic structure and return the corresponding semantic value, if they match. Inside the combinator, you get to build that semantic value from what you parse directly by taking from the input stream and by combining the semantic values from sub-combinators you call.
So, your 'String matching' combinator in your example will have a list of tokens in its scope thanks to the parsing it did. You can use the full power of Haskell to combine those tokens into a single MyString value in whatever way makes sense for your language: Maybe a 'SplicedString' type that represents what values are to be sliced into it.
The string combinator was probably called by an 'expression' combinator, which will be able to combine the MyString value with other parsed values into a MyExpression value. It's combinators returning semantic values all the way back up!