I am writing a parser using StandardTokenParsers in Scala. Need to create a regex parser to parse a path. I have tested the regex works fine but sending it to a function to parse it, the program gives an error that I am not able to figure it out! a part of code that is related to this parser is as follow:
class InfixToPostfix extends StandardTokenParsers {
import scala.util.matching.Regex
import lexical.StringLit
//parsing the path
def regexStringLit(r: Regex): Parser[String] =
acceptMatch( "string literal matching regex " + r,{ case StringLit(s) if r.unapplySeq(s).isDefined => s })
// Regex for path
val pathIdent ="""/hdfs://[\d.]+:\d+/[\w/]+/\w+([.+]\w+)+""".r
def pathIdente: Parser[String] =regexStringLit(pathIdent)
lexical.delimiters ++= List("+","-","*","/", "^","(",")",",")
def value :Parser[Expr] = numericLit ^^ { s => Number(s) }
def variable:Parser[Expr] = pathIdente ^^ { s => Variable(s) }
def parens:Parser[Expr] = "(" ~> expr <~ ")"
def argument:Parser[Expr] = expr <~ (","?)
def func:Parser[Expr] = ( pathIdente ~ "(" ~ (argument+) ~ ")" ^^ { case f ~ _ ~ e ~ _ => Function(f, e) })
//and the rest of the code ....
This parser is going to parse arithmetic operations. I use args(0) to send my input to the program which is : "/hdfs://"
and I get the following error:
[1.1] failure: string literal matching regex /hdfs://([\d\.]+):(\d+)/([\w/]+/(\w+\.\w+)) expected
Couldn't figure out how to solve it!
FYI: The part for "+1" is going to handle by the parser in the code so the part "pathIdent" is only for the path and that is the part causing the trouble. This is also good :
it works fine outside of the code checking it in : regexpal.com but still same error using it inside the program.
I am wondering if StringLit is the one that doesn't contain some of the characters and causing the error. Is there anything else other than StringLit that I can use here?
The failure to match will be because the matcher is greedy. This is a common problem with regular expression matching (and hence lexical analysis) in several languages.
The greedy matching catches you at the end of the expression.
You have
but this will fail to match because the wordp
matched by the\w
represented by the input textfolder1/p
is swallowed up by the piece([\w/]+
. It stops at the period.
. There is therefore no word before the dot to permit(\w+\.\w+)
to ever match.You'll have to rethink your regular expression and make each path fragment terminate at a solidus
rather than make it part of a set.Do you see?
To make this work you need to express in the following way:
Where I replaced
. This now specifies the ordering of the words and slashes and leaves a word unmatched for the following pattern to succeed.