I have a fairly simple language represented as a CFG.
S → A z
A → A y A
| A x A
| A w
| v
Since there's left-recursion, a recursive descent parser isn't going to cut it. However, I also need to find every possible interpretation: given v x v y v z
, I need my parser to find both (v x (v y v)) z
and ((v x v) y v) z
.
What options do I have? Shift-reduce with added backtracking to find all possibilities seems good, but I've heard that adding backtracking to a shift-reduce parser can give it exponential time complexity. This CFG is small enough that it shouldn't matter, but I'm going to need to scale this up to significantly larger grammars (with thousands of terminals).
The Earley algorithm (and its variants, such as GLR) can be implemented to create a parser which works in cubic time. Since there can be an exponential number of possible parses, that complexity is only to build a "parse forest", which is a DAG containing all possible parses. Actually enumerating the parses will take time proportional to the number of possibilities.
Note that when we talk about time complexity here, we're talking in terms of the length of the input string, not the size of the grammar. The size if the grammar has an impact, of course -- usually linear, but it depends on how you measure the size -- but the assumption is that a parser has been built for a particular grammar (which might require preprocessing dependent on the grammar size.)
The Wikipedia article linked above has a list of implementations in various languages, some of them intended for production and others simply to demonstrate the algorithm. Note that the GLR parser produced by bison is not cubic; in pathological cases, it can be exponential. Also, it does not build a parse tree (or forest); that is left to the user, and naive algorithms which don't share storage could require exponential space and time. (Nonetheless, it is quite usable for real-world grammars.)