Apparently I'm too dumb to figure this out...
Consider the following string:
foobar(123, 456, 789)
I'm trying to work out how to parse this. In particular,
call = do
cs <- many1 letter
char '('
as <- many argument
return (cs, as)
argument = manyTill anyChar (char ',' <|> char ')')
This works perfectly, until I add stuff to the end of the input string, at which point it tries to parse that stuff as the next argument, and gets upset when it doesn't end with a comma or bracket.
Fundamentally, the trouble is that a comma is a separator, while a bracket is a terminator. Parsec doesn't appear to provide a combinator for that.
Just to make things more interesting, the input string can also be
foobar(123, 456, ...
which indicates that the message is incomplete. There appears to be no way of parsing a sequence with two possible terminators and knowing which one was found. (I actually want to know whether the argument list was complete or incomplete.)
Can anyone figure out how I climb out of this?
You should exclude your separator/terminator characters from the allowed characters for a function argument. Also, you can use
between
andsepBy
to make the difference between separators and terminators clearer:However, this is probably still not what you want, because it doesn't handle whitespace properly. You should look at
Text.Parsec.Token
for a more robust way to do this.Edit
With the
...
-addition, it indeed becomes a bit weird, and I don't think it nicely fits into any of the predefined combinators, so we'll have to just do it ourselves.Let's define a type for our results:
That's like a list, but it has two different kinds of "empty list" to distinguish the
...
case. Of course, you can also use([String], Bool)
as a result type, but I'll leave that as an exercise. The following assumes we haveThe parsers become:
This handles everything fine except whitespace, for which my original recommendation to look at token parsers remains.
Let's test: