Cannot get LL(1) form of grammar for recursive descent parser

300 views Asked by At

I have grammar:

S -> bU | ad | d
U -> Ufab | VSc | bS
V -> fad | f | Ua

To contruct recursive descent parser I need LL(1) form. Best I got is:

S -> bU | ad | d
U -> fY | bSX
Y -> adScX | ScX
X -> fabX | aScX | ε

Removed left recursions and done some left factoring but I am stuck. Tried for several hours but I cannot get it...

E.g. valid string are:

  • bbdfabadc
  • bbdfabfabfabfab
  • bfadadcfabfab
  • bbadaadc
  • bfbbdfabc

Obviously my grammar form is ambiguous for some so I cannot make recursive descent parser...

From answer:

S -> bU | ad | d
U -> fYZ | bSZ
X -> fab | aSc
Y -> adA | bUc | dc
Z -> ε | XZ
A -> Sc | c

Still not LL(1). First and follow for Z are not disjoint.

1

There are 1 answers

1
Chris Dodd On

Generally to make a grammar LL(1) you'll need to repeatedly left factor and remove left recursion until you've managed to get rid of all the non-LL things. Which you do first depends on the grammar, but in this case you'll want to start with left factoring

To left factor the rule

U -> Ufab | VSc | bS

you need to first substitute V giving

U -> Ufab | fadSc | fSc | UaSc | bS

which you then left factor into

U -> UX | fY | bS
X -> fab | aSc
Y -> adSc | Sc

now U is simple enough that you can eliminate the left recursion directly:

U -> fYZ | bSZ
Z -> ε | XZ

giving you

S -> bU | ad | d
U -> fYZ | bSZ
X -> fab | aSc
Y -> adSc | Sc
Z -> ε | XZ

Now you still have a left factoring problem with Y so you need to substitute S:

Y -> adSc | bUc | adc | dc

which you left factor to

Y -> adA | bUc | dc
A -> Sc | c

giving an almost LL(1) grammar:

S -> bU | ad | d
U -> fYZ | bSZ
X -> fab | aSc
Y -> adA | bUc | dc
Z -> ε | XZ
A -> Sc | c

but now things are stuck as the epsilon rule for Z means we need FIRST(X) and FOLLOW(Z) to be disjoint (in order to decide between the two Z rules). This is generally indicative of a non-LL language, as there's some trailing context that could be associated with more than one rule (via the S -> bU -> bbSZ -> bbbUZ -> bbbbSZZ exapansion chain -- trailing Zs can be recognized but either might be empty). Often times you can still recognize this language with a simple recursive-descent parser (or LL-style state table) by simply resolving the Z ambiguity/conflict in favor of the non-epsilon rule.