I've implemented a Transpiler for a C-like language using the C# version of ANTLR3 on Windows7.
To parse if/else statements I use the following rule:
ifStatement
: 'if' '(' expression ')' b1=block
(
'else' b2=block -> ^('if' expression $b1 $b2 'else')
| 'else' ifStatement -> ^('if' expression $b1 ^(ELSIF ifStatement))
| -> ^('if' expression $b1)
)
;
To translate the tree generated by this rule from my own language into C# I use the following grammar snippet:
ifStatement
options { backtrack = true; }
:
^(n='if' expression b1=block)
-> if(
node={$n},
cond={$expression.st},
block1={$b1.st},
block2={null},
isElsif={($n.Parent.Text == "ELSIF") ? "true" : null},
node2={null}
)
|
^(n='if' expression b1=block b2=block n2='else')
-> if(
node={$n},
cond={$expression.st},
block1={$b1.st},
block2={$b2.st},
isElsif={($n.Parent.Text == "ELSIF") ? "true" : null},
node2={$n2}
)
|
^(n='if' expression b1=block b2=ifStatement)
-> elsif(
node={$n},
cond={$expression.st},
block1={$b1.st},
block2={$b2.st},
isElsif={($n.Parent.Text == "ELSIF") ? "true" : null}
)
|
^(ELSIF i=ifStatement) -> { $i.st }
;
This works fine in most cases, for example it translates the following code without problems:
if (x == "1") {
}
else if (x == "2") {
}
else if (x == "3") {
}
But when I have more than 20 "else if" clauses, the compiler needs several minutes to do its work. The time needed for compilation does not increase linearly, that is the compiler returns immediately for 17 or 18 "else if" clauses.
Update:
I have fixed the issue. I've replaced
^(n='if' expression b1=block b2=ifStatement)
-> elsif(
node={$n},
cond={$expression.st},
block1={$b1.st},
block2={$b2.st},
isElsif={($n.Parent.Text == "ELSIF") ? "true" : null}
)
|
^(ELSIF i=ifStatement) -> { $i.st }
with
^(n='if' expression b1=block ^(ELSIF b2=ifStatement))
-> elsif(
node={$n},
cond={$expression.st},
block1={$b1.st},
block2={$b2.st},
isElsif={($n.Parent.Text == "ELSIF") ? "true" : null}
)
That is I've merged the last two alternatives into a single one.
Now the Transpiler is blazingly fast and still returns correct results.
I'd really like to know why this change makes such a big difference.
Welcome to backtracking.
If you force the parser to choose between (for example) three choices (as you have in your grammar), and it has to backtrack, and the choices nest [as your if-then-else does], then a nested set of N constructs can require 3^N units of work to parse. 3^20 is 27 times as long as 3^17.
Lesson: backtracking is useful sometimes, but generally you should avoid it.
For your grammar, why not treat the if construct just like every other statement? Then it doesn't show up as a special case, and you can avoid the backtracking altogether. You'll even get the standard "else attaches to the closest if" rule that is standard in most programming langauges.