String ending with character in parboiled2, when the string can contain that character

436 views Asked by At

I've come across a tricky problem writing a parboiled2 parser, which is that I need to match a portion of a line which is a string which has its end marked by a : character. This would be easy enough, except that the string can contain the : character.

At the moment I've got this which treats the string as a group of colon-terminated strings and concats them, but this consumes the trailing : which I don't want as the trailing : is not part of the string itself.

def address = rule { capture(oneOrMore(zeroOrMore(noneOf(":")) ~ ":")) }

I feel like I should be using &(":") somewhere in here but I'm struggling to work that in while matching the interstitial : characters.

Example successful matches (as part of a longer string):

  • localhost: -> localhost
  • 1::: -> 1::
  • ::: -> ::

Mismatches:

  • :

Any suggestions would be welcome, even if it's "you can't do this" so I can stop racking my brains.


The context for this is parsing the bind setting in an HAProxy configuration file. Some examples of valid strings given the following (simplified) case classes are:

case class Bind(endpoint: Endpoint, params: Seq[String])
case class Endpoint(address: Option[String], port: Option[Int])
  • bind :80 -> Bind(Endpoint(None, Some(80)), Seq())
  • bind localhost:80 -> Bind(Endpoint(Some("localhost"), Some(80)), Seq())
  • bind localhost -> Bind(Endpoint(Some("localhost"), None), Seq())
  • bind :80 param1 -> Bind(Endpoint(None, Some(80)), Seq("param1")))

In other words, if there is a string it needs to be terminated before the final : as that's the indicator that there is a port. The endpoint rule looks something like this:

def endpoint = rule { optional(address) ~ optional(':' ~ int) ~> Endpoint }

Ultimately the matchable string for the endpoint is terminated by either a space or the end of the line, so one option would be to just capture until the space and then parse the string separately, but I was hoping to do it within the main parser.

1

There are 1 answers

1
cbley On BEST ANSWER

I think that the following should work for your problem description:

def noColons = rule { zeroOrMore(noneOf(":")) }
def colonWithNext = rule { ':' ~ &(noColons ~ ':') }
def address = rule { capture(oneOrMore(noColons).separatedBy(colonWithNext)) ~ ':' }

The problem with your code was the usage of the ~ combinator, since an expression like A ~ B only matches if at first A matches and then B matches, but it would mismatch at B if rule B is part of rule A. There's no backtracking involved here, the parboiled2 parser only backtracks for alternatives.

So, in this case you have to make sure to consume the ':' only if there's another one following it.