How to define a Regex in StandardTokenParsers to identify path?

509 views Asked by At

I am writing a parser in which I want to parse arithmetic expressions like: /hdfs://xxx.xx.xx.x:xxxx/path1/file1.jpg+1 I want to parse it change the infix to postfix and do the calculation. I used helps from a part of code in another discussion as well.

 class InfixToPostfix extends StandardTokenParsers {
 import lexical._

 def regexStringLit(r: Regex): Parser[String] = acceptMatch(
 "string literal matching regex " + r,
 { case  StringLit(s)  if r.unapplySeq(s).isDefined => s })
 def pathIdent: Parser[String] =regexStringLit("/hdfs://([\d\.]+):(\d+)/([\w/]+/(\w+\.\w+))".r)
 lexical.delimiters ++= List("+","-","*","/", "^","(",")",",")
 def value :Parser[Expr] = numericLit ^^ { s => Number(s) }
def variable:Parser[Expr] =  pathIdent ^^ { s => Variable(s) }
def parens:Parser[Expr] = "(" ~> expr <~ ")"

def argument:Parser[Expr] = expr <~ (","?)
def func:Parser[Expr] = ( pathIdent ~ "(" ~ (argument+) ~ ")" ^^ { case f ~ _ ~ e ~ _ => Function(f, e) })

def term = (value | parens | func | variable)

// Needed to define recursive because ^ is right-associative
def pow :Parser[Expr] = ( term ~ "^" ~ pow ^^ {case left ~ _ ~ right => BinaryOperator(left, "^", right) }|
            term)
def factor = pow * ("*" ^^^ { (left:Expr, right:Expr) => BinaryOperator(left, "*", right) } |
                    "/" ^^^ { (left:Expr, right:Expr) => BinaryOperator(left, "/", right) } )
def sum =  factor * ("+" ^^^ { (left:Expr, right:Expr) => BinaryOperator(left, "+", right) } |
                    "-" ^^^ { (left:Expr, right:Expr) => BinaryOperator(left, "-", right) } )
def expr = ( sum | term )

def parse(s:String) = {

   val tokens = new lexical.Scanner(s)
    phrase(expr)(tokens)
}

//and the rest of the code

I was able to solve the following errors with the help of this answer:

      ScalaParser.scala:192: invalid escape character
  [error]     def pathIdent: Parser[String] =regexStringLit("/hdfs://([\d\.]+):(\d+)/([\w/]+/(\w+\.\w+))".r)
  [error]                                                               ^
  [error] ScalaParser.scala:192: invalid escape character
  [error]     def pathIdent: Parser[String] =regexStringLit("/hdfs://([\d\.]+):(\d+)/([\w/]+/(\w+\.\w+))".r)
   [error]                                                                ^
   [error] ScalaParser.scala:192: invalid escape character
   [error]     def pathIdent: Parser[String] =regexStringLit("/hdfs://([\d\.]+):(\d+)/([\w/]+/(\w+\.\w+))".r)
   [error]                                                                        ^

With the change of pathIdent to this:

  def pathIdent: Parser[String] =regexStringLit("/hdfs://([\\d.]+):(\\d+)/([\\w/]+/(\\w+\\.w+))".r)

Now I am getting a run time error which says:

 [1.1] failure: string literal matching regex /hdfs://([\d\.]+):(\d+)/([\w/]+/(\w+\.\w+)) expected

/hdfs://111.33.55.2:8888/folder1/p.a3d+1
^

It was working using JavaTokenParsers but with current changes and I had to use StandardTokenParsers.

1

There are 1 answers

6
Lodewijk Bogaards On BEST ANSWER

In a double quoted string backslash is an escape character. If you mean to use the literal backslash in a double quotes string you must escape it, thus "\d" should be "\\d".

Furthermore you do not need to escape the regex dot within a character class, since dot has no special meaning with a character class. So "[\d.]" should just be "[\d.]".

You can also forgo all this escaping business by using the raw interpolator or multi-line string literals using triple quotes.