I am writing a parser using StandardTokenParsers in Scala. Need to create a regex parser to parse a path. I have tested the regex works fine but sending it to a function to parse it, the program gives an error that I am not able to figure it out! a part of code that is related to this parser is as follow:
class InfixToPostfix extends StandardTokenParsers {
import scala.util.matching.Regex
import lexical.StringLit
//parsing the path
def regexStringLit(r: Regex): Parser[String] =
acceptMatch( "string literal matching regex " + r,{ case StringLit(s) if r.unapplySeq(s).isDefined => s })
// Regex for path
val pathIdent ="""/hdfs://[\d.]+:\d+/[\w/]+/\w+([.+]\w+)+""".r
def pathIdente: Parser[String] =regexStringLit(pathIdent)
lexical.delimiters ++= List("+","-","*","/", "^","(",")",",")
def value :Parser[Expr] = numericLit ^^ { s => Number(s) }
def variable:Parser[Expr] = pathIdente ^^ { s => Variable(s) }
def parens:Parser[Expr] = "(" ~> expr <~ ")"
def argument:Parser[Expr] = expr <~ (","?)
def func:Parser[Expr] = ( pathIdente ~ "(" ~ (argument+) ~ ")" ^^ { case f ~ _ ~ e ~ _ => Function(f, e) })
//and the rest of the code ....
This parser is going to parse arithmetic operations. I use args(0) to send my input to the program which is : "/hdfs://111.33.55.2:8888/folder1/p.a3d+1"
and I get the following error:
[1.1] failure: string literal matching regex /hdfs://([\d\.]+):(\d+)/([\w/]+/(\w+\.\w+)) expected
/hdfs://111.33.55.2:8888/folder1/p.a3d
^
Couldn't figure out how to solve it!
FYI: The part for "+1" is going to handle by the parser in the code so the part "pathIdent" is only for the path and that is the part causing the trouble. This is also good :
"""/hdfs://\d+(\.\d+){3}:\d+(/(\w+([.+]\w+)*))+""".r
it works fine outside of the code checking it in : regexpal.com but still same error using it inside the program.
I am wondering if StringLit is the one that doesn't contain some of the characters and causing the error. Is there anything else other than StringLit that I can use here?
The failure to match will be because the matcher is greedy. This is a common problem with regular expression matching (and hence lexical analysis) in several languages.
The greedy matching catches you at the end of the expression.
You have
([\w/]+/(\w+\.\w+))
but this will fail to match because the wordp
matched by the\w
represented by the input textfolder1/p
is swallowed up by the piece([\w/]+
. It stops at the period.
. There is therefore no word before the dot to permit(\w+\.\w+)
to ever match.You'll have to rethink your regular expression and make each path fragment terminate at a solidus
/
rather than make it part of a set.Do you see?
To make this work you need to express in the following way:
Where I replaced
[\w/]+/
by(\w/)+
. This now specifies the ordering of the words and slashes and leaves a word unmatched for the following pattern to succeed.