PEG Grammar to match a space separated or comma separated list

248 views Asked by At

I am trying to make a simple PEG (pegjs) grammar to parse a space separated list or a comma separated list of numbers, but I am clearly missing something fundamental. That is, I want to match strings like "1 2 3" or "1,2,3", but not a mixed expression "1 2,3".

My attempted grammar is (which can be run at https://pegjs.org/online):

start = seq

seq = num (" " n:num {return n})*
    / num ("," n:num {return n})*


num = a:$[0-9]+ {return parseInt(a, 10)}

EOL = !.

However, this grammar will only parse a space separated list. If I modify it to be:

start = seq

seq = num (" " n:num {return n})* EOL
    / num ("," n:num {return n})* EOL


num = a:$[0-9]+ {return parseInt(a, 10)}

EOL = !.

it will parse a space separated or a comma separated list. However, I feel like I shouldn't need to add EOL to end of every one of my expressions... I thought that, when given a comma separated list, pegjs would attempt to match it against a space separated list, fail, and then match against the comma separated list rule.

What am I missing?

1

There are 1 answers

0
rici On

The alternative num (" " n:num {return n})* will successfully match a single num not followed by a space, since the * means "0 or more repetitions" and that means 0 repetitions counts. Once an alternative succeeds, no other alternatives in that set are tried, even if the subsequent parse fails. So that essentially makes the second alternative irrelevant.

When you add the EOL marker to the alternatives, you prevent the first alternative from succeeding unless it matches to the end. In that case, the next alternative is tried. But, as you say, it's a bit ugly.

Here's one possibility. By factoring out the initial num and changing the repetition operator to + (which will not match an empty input), I force the first alternative to fail if the first character after the num is not a space. The second alternative will then be tried; only if that also fails will the optionality operator be applied.

seq = num ( (" " n:num {return n})+
          / ("," n:num {return n})+
          )?

I tested that briefly with the pegjs online tool. If you use it, you will probably want to do something to flatten the resulting list of numbers.