I have a strings like that: ***, **(*)*, ****(**(**)*)**
And I want to parse it in the data structure like that:
data Tree = Node [S] Tree Tree | Empty where S is *
(* doesn't mean any char it is just star symbol)
I tried to build parser (I use megaparsec but it is very similar to habitual parsec):
data Tree = Node [Char] Tree Tree | Empty deriving (Show)
chain :: Parser Tree
chain = do
line <- many $ char '*'
branch <- between (char '(') (char ')') chain
cont <- (eof >> return Empty) <|> chain
return $ Node line branch cont
test = parseTest chain "****(*(*)***)*(*)**"
But It doesn't work. I tried many ways but can't struggle with it.
Let's start with a simpler test case:
Note that there's a parse error after reading the first star. The input ended, but the parser expected to read either another star or an open parenthesis.
Looking at your parser, it's clear that:
succeeds, by reading the first string of stars, but the next line:
requires an open parenthesis in the input, and this is not made optional in any way.
We could fix this by writing:
Now, the parser works okay on
"***", but it hangs on"**(*)*". The problem is the line:This tries to decide when to stop parsing based on detecting the end of input, but this only works at the top level
chaincall where the end of the current tree corresponds to the end of the input -- in a recursive call, the tree can end before the input does, so this won't work.Specifically, in the test case
"**(*)*", when parsing the tree inside the parentheses, namely*, we getlineset to*,branchset toEmpty, and then thecontline sees that we aren't at the end of the input (since the rest of the input")*"is still to be read) and recursively callschain. In this recursive call,linegets set to the empty string,branchgets set toEmpty, and thecontline again causes a recursive call tochain, and we have an infinite loop.Instead, let's write a parser
treethat parses thelineof a tree:and now optionally a
treein parenthesis (for the left hand side):If there is no left-hand side, then there can't be a right hand side either (convince yourself that this is true! -- try to write a tree that has no left hand side in parenthesis but still has a non-empty right hand side, and you'll see it can't be done), so we're finished:
If there is a left-hand side, then read the right hand side tree (which might be empty, but that's okay) and return the node:
The whole parser looks like:
and hopefully does what you expect:
This parser just ignores trailing input:
but you can write a wrapper to require the end of the outermost tree correspond to the end of input: