How to match exactly 'n' given characters with FastParse

630 views Asked by At

The FastParse parser-combinator scala library gives you the .rep(n) 'Repeat' method to allow you to create a new parser that attempts to parse the givenParser n or more times. What's the canonical way to do this if I want exactly n matches?

In my case, I want to parse a 40-character Git commit id - if it were longer than 40 characters, that's not a commit id, and it shouldn't be a match.

The closest example I've found in docs so far is:

val unicodeEscape = P( "u" ~ hexDigit ~ hexDigit ~ hexDigit ~ hexDigit )

...which matches 4 characters with simple repetition (verbose for a 40-character commit id).

These are parser-combinators, not regex, where the answer would be something like \p{XDigit}{40}.

3

There are 3 answers

0
ljd On BEST ANSWER

Since the issue was closed by this commit, rep supports a max keyword argument. It also now supports an exactly keyword argument.

hexdigit.rep(exactly = 40)
0
Roberto Tyley On

Ah, looks like it's not currently available, but is a known 'missing feature' for FastParse:

https://github.com/lihaoyi/fastparse/issues/27

1
Kolmar On

Well, even if this functionality is not available now, you can write a function that applies ~ a given number of times:

def repExactly(parser: Parser[Unit])(times: Int): Parser[Unit] =
  Iterator.iterate(parser)(_ ~ parser).drop(times - 1).next()

Here is a small test:

object Main extends App {

  import fastparse._

  def repExactly(parser: Parser[Unit])(times: Int): Parser[Unit] =
    Iterator.iterate(parser)(_ ~ parser).drop(times - 1).next()

  val hexDigit = P( CharIn('0'to'9', 'a'to'f', 'A'to'F') )
  def fiveHexDigits = repExactly(hexDigit)(5) ~ End

  println(fiveHexDigits.parse("123a"))
  println(fiveHexDigits.parse("123ab"))
  println(fiveHexDigits.parse("123abc"))

}

And the output is

Failure(hexDigit:4 / CharIn("0123456789abcdefABCDEF"):4 ..."", false)
Success((), 5)
Failure(End:5 ..."c", false)

And here is a generic way to implement this functionality as an operator * of Parser (The original implementation of rep does something quite convoluted, so my implementation may not account for some cases. Also, I didn't test how it works with arguments that have cuts):

object Main extends App {

  import fastparse._

  implicit class ParserExtension[T](parser: Parser[T]) {
    def *[R] (times: Int)(implicit ev: Implicits.Repeater[T, R]): Parser[R] = {
      assert(times >= 1)

      Iterator.iterate(parser map { t =>
        val acc = ev.initial
        ev.accumulate(t, acc)
        acc
      }){ prev: Parser[ev.Acc] =>
        (prev ~ parser) map {
          case (acc, t) =>
            ev.accumulate(t, acc)
            acc
        }
      }.drop(times - 1).next() map (acc => ev.result(acc))
    }
  }

  val hexDigit = P( CharIn('0'to'9', 'a'to'f', 'A'to'F') )

  val fiveDigitsSeq = (hexDigit.! * 5) ~ End

  println(fiveDigitsSeq.parse("123a"))   // Failure ...
  println(fiveDigitsSeq.parse("123ab"))  // Success(ArrayBuffer(1, 2, 3, a, b), 5)
  println(fiveDigitsSeq.parse("123abc")) // Failure ...
  println()

  val fiveDigitsStr = (hexDigit * 5).! ~ End

  println(fiveDigitsStr.parse("123a"))   // Failure ...
  println(fiveDigitsStr.parse("123ab"))  // Success(123ab, 5)
  println(fiveDigitsStr.parse("123abc")) // Failure ...
}