I am using Parsec library to parse a string. The problem I have is that can't differentiate some tokens because they are words with the same prefix. Simplifying the whole grammar (in my case it's not a regular one), say we have the following:
T0 := <empty> | tag0 T1 T0
T1 := tag1 | tag1 T1
So I may have strings like "tag0tag1tag1" or like "tag0tag1tag0tag1" etc., basically we have a "tag0" string followed by an arbitrary (not zero) number of "tag1" string, and all this could also be repeated any number of times.
So what I tried was something like:
wrongParser :: Parser String
wrongParser = do
string "tag0"
many $ string "tag1"
return "Ok"
And tested with
ghci> parse wrongParser "Error" "tag0tag1tag1tag0tag1"
Left "Error" (line 1, column 13):
unexpected "0"
expecting "tag1"
So what seems to happen here is that the parser read "tag" from "tag0", but it is expecting "tag1" instead (because is still reading many of "tag1").
Is there a way to make the parser to take the tag string as a whole so that instead of failing it just assumes that all the many tag1 are already read and stop with no error (maybe another function than string)? Or what is the correct way to handle this case?
This is a common misconception with
Parsec:Parsec)Parsecis a parser which comsumes less output than expectedTherefore, your parser works like this
Probably you want to say, "if you fail reading tag1, just pretend you haven't read it at all". This is called backtracking and you use the
trycombinator to make it happenNow if you want to parse the whole string you use
many rightParser. For the sake of example, let's say you want to return all "tag1"s. ThenNotice, that other libraries (ex: Attoparsec) do always backtrack on failure. This is a design decision made by each library. Also, backtracking can be an expensive operation, so you may want to write your parser differently (example: always parse "tag" and backtrack only on "0" or "1")