How to solve an error related to creating parser from regex?

Question

How to solve an error related to creating parser from regex?

181 views Asked by Mahsa At 17 June 2015 at 20:56

I am writing a parser using StandardTokenParsers in Scala. Need to create a regex parser to parse a path. I have tested the regex works fine but sending it to a function to parse it, the program gives an error that I am not able to figure it out! a part of code that is related to this parser is as follow:

 class InfixToPostfix extends StandardTokenParsers {
 import scala.util.matching.Regex
 import lexical.StringLit
//parsing the path
 def regexStringLit(r: Regex): Parser[String] =
 acceptMatch( "string literal matching regex " + r,{ case  StringLit(s) if r.unapplySeq(s).isDefined => s })
// Regex for path
 val pathIdent ="""/hdfs://[\d.]+:\d+/[\w/]+/\w+([.+]\w+)+""".r   
 def pathIdente: Parser[String] =regexStringLit(pathIdent)

 lexical.delimiters ++= List("+","-","*","/", "^","(",")",",")
 def value :Parser[Expr] = numericLit ^^ { s => Number(s) }
 def variable:Parser[Expr] =  pathIdente ^^ { s => Variable(s) }
 def parens:Parser[Expr] = "(" ~> expr <~ ")"

 def argument:Parser[Expr] = expr <~ (","?)
 def func:Parser[Expr] = ( pathIdente ~ "(" ~ (argument+) ~ ")" ^^ { case f ~ _ ~ e ~ _ => Function(f, e) })
//and the rest of the code ....

This parser is going to parse arithmetic operations. I use args(0) to send my input to the program which is : "/hdfs://111.33.55.2:8888/folder1/p.a3d+1"

and I get the following error:

[1.1] failure: string literal matching regex /hdfs://([\d\.]+):(\d+)/([\w/]+/(\w+\.\w+)) expected

 /hdfs://111.33.55.2:8888/folder1/p.a3d
 ^

Couldn't figure out how to solve it!

FYI: The part for "+1" is going to handle by the parser in the code so the part "pathIdent" is only for the path and that is the part causing the trouble. This is also good :

  """/hdfs://\d+(\.\d+){3}:\d+(/(\w+([.+]\w+)*))+""".r

it works fine outside of the code checking it in : regexpal.com but still same error using it inside the program.

I am wondering if StringLit is the one that doesn't contain some of the characters and causing the error. Is there anything else other than StringLit that I can use here?

Original Q&A

There are 1 answers

**Brian Tompsett - 汤莱恩** · Accepted Answer · 2015-06-17T21:12:26+00:00

The failure to match will be because the matcher is greedy. This is a common problem with regular expression matching (and hence lexical analysis) in several languages.

The greedy matching catches you at the end of the expression.

You have ([\w/]+/(\w+\.\w+)) but this will fail to match because the word p matched by the \w represented by the input text folder1/p is swallowed up by the piece ([\w/]+. It stops at the period .. There is therefore no word before the dot to permit (\w+\.\w+) to ever match.

You'll have to rethink your regular expression and make each path fragment terminate at a solidus / rather than make it part of a set.

Do you see?

To make this work you need to express in the following way:

"""/hdfs://[\d.]+:\d+/(\w/)+\w+([.+]\w+)+""".r

Where I replaced [\w/]+/ by (\w/)+. This now specifies the ordering of the words and slashes and leaves a word unmatched for the following pattern to succeed.

TechQA.

How to solve an error related to creating parser from regex?

There are 1 answers

Related Questions in REGEX

Related Questions in SCALA

Related Questions in PARSING

Related Questions in LEXICAL-ANALYSIS

Popular Questions

Popular Tags

Trending Questions