VBScript Grammar: How to model sub call without parentheses

401 views Asked by At

I'm writing a GOLD Parser grammar for VBScript. Here is an extract:

<CallStmt>             ::= 'Call' <CallExpr>
                         | <CallExpr> <ParameterList>
                         !| <CallExpr> '(' <Expr> ')'
                         | <CallExpr> '(' ')'

<AssignStmt>           ::= <CallExpr> '=' <Expr>
                         | 'Set' <CallExpr> '=' <Expr>
                         | 'Set' <CallExpr> '=' 'New' <CallExpr>

<CallExpr>             ::= '.' <LeftExpr>
                         | <LeftExpr>

<LeftExpr>             ::= ID
                         | IDDot <LeftExpr>
                         | ID '(' <ParameterList> ')'
                         | ID '(' <ParameterList> ').' <LeftExpr>

!VBScript allows to skip parameters a(1,,2)
<ParameterList>        ::= <Expr> ',' <ParameterList>
                         | ',' <ParameterList> 
                         | <Expr>
                         |

! Value can be reduced from <Expr>                       
<Value>                ::= <ConstExpr>
                         | <CallExpr>
                         | '(' <Expr> ')'                        

I have a conflict concerning the <CallStmt> ::= <CallExpr> <ParameterList> rule. This rule describes calling a sub without surrounding parentheses. For example the following statements are syntactically correct:

obj.sub1(1, 2).sub2 1, 2
obj.sub1(1, 2).sub2(1),(2)
Call obj.sub1(1, 2).sub2(1, 2)

How can i discriminate between a sub call with surrounding parentheses sub1(1, 2) and a sub call with parentheses surrounding the arguments sub2(1),(2)?

1

There are 1 answers

0
Lucero On BEST ANSWER

The problem you're having is that the VBScript syntax is ambiguous.

Which variant is obj.sub1 (1)? The one with parens around the arguments, or the one without and the first argument happens to be in parens?

If we cannot tell, then neither can the parser... we can only tell for sure when we have multiple arguments, e.g. a comma. Therefore, let's say that by default we choose to only use parens for arguments when we encounter more than one parameter or no parameter at all.

In your efforts to solve the issue you have started to make over-competent terminals which also include the dot. This is a bad idea, since it doesn't work (spaces around the '.' are allowed but not matched by these terminals) and it can lead to unexpected tokenization behavior and degraded performance. Typically it indicates a problem in your grammar.

Here's a grammar I hacked together which parses your samples just fine, and in fact should also handle assignments and constructor calls correctly. Note that the <ParameterList> will only match two or more parameters but not zero or one parameter; this is key to work around the ambiguity which causes your issue.

<CallStmt>    ::= 'Call' <CallPath>
               |  <CallPath>

<AssignStmt>  ::= <AssignPath> '=' <Expr>
               |  'Set' <AssignPath> '=' <Expr>
               |  'Set' <AssignPath> '=' 'New' <CtorPath>

<CtorPath>    ::= ID '.' <CtorPath>
               |  ID
               |  <CallExpr>
               |  ID <ParameterList>

<CallExpr>    ::= ID '(' ')'
               |  ID '(' <Expr> ')'
               |  ID '(' <ParameterList> ')'

<CallPath>    ::= <Member> '.' <CallPath>
               |  ID
               |  <CallExpr>
               |  ID <ParameterList>

<Member>      ::= <CallExpr>
               |  ID

<MemberPath>  ::= <Member> '.' <MemberPath>
               |  <Member>

<AssignPath>  ::= <Member> '.' <AssignPath>
               |  ID

!VBScript allows to skip parameters a(1,,2)
<ParameterList> ::= <Expr> ',' <ParameterList>
               | <Expr> ',' <Expr>
               | <Expr> ','
               | ',' <ParameterList> 
               | ','

! Value can be reduced from <Expr>                       
<Value>       ::= NumberLiteral
               | StringLiteral
               | <MemberPath>
               | '(' <Expr> ')'

!--- The rest of the grammar ---               
"Start Symbol"  = <Start>

{WS}            = {Whitespace} - {CR} - {LF}
{ID Head}       = {Letter} + [_]
{ID Tail}       = {Alphanumeric} + [_]
{String Chars}  = {Printable} + {HT} - ["]

Whitespace      = {WS}+
NewLine         = {CR}{LF} | {CR} | {LF}

ID              = {ID Head}{ID Tail}*
StringLiteral   = ('"' {String Chars}* '"')+
NumberLiteral   = {Number}+ ('.' {Number}+ )?

<nl>          ::= NewLine <nl>          !One or more
               |  NewLine

<nl Opt>      ::= NewLine <nl Opt>      !Zero or more
               |  !Empty

<Start>       ::= <nl opt> <StmtList>

<StmtList>    ::= <CallStmt> <nl> <StmtList>
               |  <AssignStmt> <nl> <StmtList>
               |

<Expr>        ::= <Add Exp> 

<Add Exp>     ::= <Add Exp> '+' <Mult Exp>
               |  <Add Exp> '-' <Mult Exp>
               |  <Mult Exp> 

<Mult Exp>    ::= <Mult Exp> '*' <Negate Exp> 
               |  <Mult Exp> '/' <Negate Exp> 
               |  <Negate Exp> 

<Negate Exp>  ::= '-' <Value> 
               |  <Value>