How to match all words in a sentence with scala combinators?

355 views Asked by At

For a first test with scala combinators, I am trying to get all words from a sentence, but I am just getting "None" from the following code :

import java.io.File
import scala.io.Source
import scala.util.parsing.combinator._

object PgnReader extends TagParser {

  def parseFile(inputFile:File) = {
    val pgnStream = Source.fromFile(inputFile)
    val pgnStr = pgnStream.mkString
    println(parseAll(tag, "Hello World !").getOrElse("None"))
    pgnStream.close
  }
}

trait TagParser extends RegexParsers {
  val tag:Parser[String] = """[:alpha:]+""".r ^^ (_.toString)
}

I would like to get something like :

Hello
World

or even like :

List(Hello, World)

Am I on the right way with my code ?

I am using scala 2.11 and scala combinators

2

There are 2 answers

0
Allen Luce On BEST ANSWER

I think this might get you closer:

trait TagParser extends RegexParsers {
  val tag = rep("""\p{Alpha}+""".r) ^^ (_.map(_.toString))
}

POSIX character classes have a different syntax in Scala (as inherited from Java). The rep() syntax allows for multiple occurrences (giving a List()).

That will still choke on the exclamation point, so you can augment your regex a little. I'd probably also go with the notion of "tag" and "tags" separately to make things clearer:

trait TagParser extends RegexParsers {
  val tags = rep(tag)
  val tag = """\p{Alpha}+|!""".r ^^ (_.toString)
}
...
println(parseAll(tags, "Hello World !").getOrElse(None))
...
6
dk14 On

You should use something like that to match sequence of tokens instead of one token:

trait TagParser extends RegexParsers {
  val tags: Parser[List[String]] = rep("""[a-zA-Z]+""".r)
}

rep is:

A parser generator for repetitions.

rep(p) repeatedly uses p to parse the input until p fails (the result is a List of the consecutive results of p).

http://www.scala-lang.org/files/archive/nightly/docs/parser-combinators/index.html#scala.util.parsing.combinator.RegexParsers