Non-greedy list parsing with pyparsing

Question

Non-greedy list parsing with pyparsing

555 views Asked by Jonathan Barber At 01 January 2025 at 21:07

I have a string consisting of a list of words which I am attempting to parse with pyparsing.

The list always has a minimum of three items. From this I want pyparsing to generate three groups, the first of which contains all of the words upto the last two items, and the last two groups should be the last two items. For example:

"one two three four"

should be parsed to something resembling:

["one two"], "three", "four"

I can do this with a Regex:

import pyparsing as pp
data = "one two three four"
grammar = pp.Regex(r"(?P<first>(\w+\W?)+)\s(?P<penultimate>\w+) (?P<ultimate>\w+)")
print(grammar.parseString(data).dump())

which gives:

['one two three four']
- first: one two
- penultimate: three
- ultimate: four

My problem is that I'm failing to get the same result with the non-Regex ParserElement's because of pyparsing greedy nature, for example the following:

import pyparsing as pp
data = "one two three four"
word = pp.Word(pp.alphas)
grammar = pp.Group(pp.OneOrMore(word))("first") + word("penultimate") + word("ultimate")
grammar.parseString(data)

fails with the traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/pyparsing.py", line 1125, in parseString
    raise exc
pyparsing.ParseException: Expected W:(abcd...) (at char 18), (line:1, col:19)

because OneOrMore slurps all of the words in the list. My attempts so far to prevent this greedy behaviour with FollowedBy or NotAny are failing - any suggestions as how I can get the desired behaviour?

Original Q&A

There are 1 answers

**PaulMcG** · Accepted Answer · 2015-06-18T23:23:58+00:00

Well, your OneOrMore expression just needs a little tightening up - you are on the right track with FollowedBy. You don't really want just OneOrMore(word), you want "OneOrMore(word that is followed at least 2 more words)". To add this kind of lookahead to pyparsing, you can even use the new '*' multiplication operator to specify the lookahead count:

grammar = pp.Group(pp.OneOrMore(word + pp.FollowedBy(word*2)))("first") + word("penultimate") + word("ultimate")

Now dumping this out gives the desired:

[['one', 'two'], 'three', 'four']
- first: ['one', 'two']
- penultimate: three
- ultimate: four

TechQA.

Non-greedy list parsing with pyparsing

There are 1 answers

Related Questions in PYTHON

Related Questions in PYPARSING

Related Questions in NON-GREEDY

Popular Questions

Popular Tags

Trending Questions