PEG parsing match at least one preserving order

91 views Asked by At

Given the PEG rule:

rule = element1:'abc' element2:'def' element3:'ghi' ;

How do I rewrite this such that it matches at least one of the elements but possibly all while enforcing their order?

I.e. I would like to match all of the following lines:

abc def ghi
abc def
abc     ghi
    def ghi
abc
    def
        ghi

but not an empty string or misordered expressions, e.g. def abc.

Of course with three elements, I could spell out the combinations in separate rules, but as the number of elements increases, this becomes error prone.

Is there a way to specify this in a concise manner?

2

There are 2 answers

1
Apalala On BEST ANSWER

You can use optionals:

rule = [element1:'abc'] [element2:'def'] [element3:'ghi'] ;

You would use a semantic action for rule to check that at least one token was matched:

def rule(self, ast):
    if not (ast.element1 or ast.element2 or ast.element3):
        raise FailedSemantics('Expecting at least one token')
    return ast

Another option is to use several choices:

rule 
    = 
       element1:'abc' [element2:'def'] [element3:'ghi'] 
    | [element1:'abc']  element2:'def' [element3:'ghi'] 
    | [element1:'abc'] [element2:'def'] element3:'ghi' 
    ;

Caching will make the later as efficient as the former.

Then, you can add cut elements for additional efficiency and more meaningful error messages:

rule 
    = 
       element1:'abc' ~  [element2:'def' ~] [element3:'ghi' ~] 
    | [element1:'abc' ~]  element2:'def' ~  [element3:'ghi' ~] 
    | [element1:'abc' ~] [element2:'def' ~] element3:'ghi'  ~
    ;

or:

rule = [element1:'abc' ~] [element2:'def' ~] [element3:'ghi' ~] ;
0
david.pfx On

The answer is: one precondition on the disjunct, and then a sequence of optionals.

rule = &(e1 / e2 / e3) e1? e2? e3?

This is standard PEG, with & meaning 'must be present but not consumed' and ? meaning 'optional'. Most PEG parsers have these features if not with these symbols.