I'm currently trying to familiarize myself with packrat parsing. So I've read the PDF paper from 2002 linked here and in section 2.3 it describes packrat caching as a preliminary process (which occurs before the actual parsing) in which a full caching table is pre-constructed by reading the input from right to left. Only then, the actual linear parsing from left to right can start.
But in every PEG parser implementation I found, the "cache" option is usually a caching process that occurs during the actual left to right parsing. For example here.
Is there any difference between both approaches? Thank you.
I recently worked on similar research, met the exact same confusion, and resolved it. Regardless if you are still working on this topic, here's my answer.
Your understanding is correct:
But there's just one approach, not two. Let's use one simple example Parsing Expression Grammar (PEG) without left-recursion:
E -> Num + E | Num(Note that, a left-recursion example requires another long explanation, you can refer CPython's implementation for details)
The Syntax Directed Translation (SDT) will be something like:
And we can write a
parse_Efunction in below:According to Byran Ford's paper, the parser needs to scan the input string from left to right and construct the cache in any position:
So, let's check the cache construction under the hood, initially, we have an empty cache and input string:
The function call happens in the following order when
idx=0. Clearly, we construct the cache from right to left at position 0 (not even to mentionidx=1or above).parse_Char(Y)happens earlier thanparse_Char(X)(Y > X)parse_Char(X)must happens earlier thanparse_E(X)If you want to read the full Python example of Packrat parser implementation, you can check my repository. It contains a handmade Packrat parser and a CPython PEG generated Packrat parser based on a simple meta grammar.