Limitations of PEG grammar & parser generators?

7.7k views Asked by At

I was enjoying using YARD a lot:

http://www.ootl.org/yard/

http://code.google.com/p/yardparser/

http://www.codeproject.com/KB/recipes/yard-tokenizer.aspx

I was able to construct fully functional calculator. I'm evaluating YARD to do PHP parser. Please kindly advise on limitations of PEG grammar & parser generators. Thank you very much!

2

There are 2 answers

10
DrPizza On BEST ANSWER

I think the big "problem" with PEGs is that they don't fit into the normal taxonomy of grammars as they operate in a fundamentally different way. Normal grammars are "backwards" in the sense that they describe all the possible sentences (programs) that can be generated. PEGs describe how to parse--they come at the problem from the other end.

In my view this is a more natural way to think about the problem, and certainly for any hand-written (recursive-descent) parser I wouldn't do anything else.

5
hippietrail On

The main limitation of PEG grammars is that they don't deal with ambiguity at all.

To be sure, this is also their strength since dealing with ambiguities is one of the most frustrating parts of using a CFG (context free grammar) tool.

With PEGs you deal with ambiguities explicitly by ordering the rule you want to match before another rule that would match ambiguously but which you don't want.

The problem is that you don't always even know about some or even any of the ambiguities in a language or a grammar and PEG generators, at least the ones I've tried, don't analyse the grammar for ambiguity to help you find them and then design and order your rules to deal with them the right way.

CFG parser generators like yacc and bison analyse your grammar and report all the ambiguities. Unfortunately they often report them in a pretty cryptic way that can be hard to make sense of. And of course it's often hard to fix the grammar to deal with them. But at least you will be aware that they exist.

With a PEG grammar you can be blissfully ignorant of the ambiguities in your conceptual grammar because once you make it a PEG it doesn't have ambiguities any more, it just has matching rules and maybe silently unreachable rules which would also match if they had higher precedence. These might not show up in your testing but may show up after release.

With CFG grammars you are forced to deal with ambiguities during development, but it won't be easy.


In the event I'm not making it clear, here's a six-year-old discussion by Joshua Haberman over on the Lambda the Ultimate programming languages blog: PEGs and Packrat Parsing are not the answer.