pyPEG - data identified by a `flag()` function are returned incorrectly by `compose()` function

208 views Asked by At

I'm in a situation where I need to parse a legacy format. What I want to do is to write a parser that recognizes the format and transform it to an object which is easier to work with.

I managed to parse the input, the problem is when I want to transform it back to a string. To sum it up: When I pass the result of my parse() as an argument to my compose() method, it does not return a correct string.

Here's an output and source code. I'm a beginner when it comes to peg, is there anything I misunderstood? Notice that I have (126000-147600,3); in my initial string while in a composed string it comes with - in front of it.

Output:

********************************************************************************
-t gmt+1 -n GB_EN -p '39600-61200,0; (126000-147600,3); -(212400-234000,5); 298800; (320400); 385200-406800,0; 471600-493200,0; 558000-579600,0'
********************************************************************************
gmt+1 GB_EN
********************************************************************************
[{'end': '61200', 'interval': '0', 'start': '39600'},
 {'end': '147600', 'interval': '3', 'start': '126000'},
 {'end': '234000', 'interval': '5', 'inverted': True, 'start': '212400'},
 {'start': '298800'},
 {'start': '320400'},
 {'end': '406800', 'interval': '0', 'start': '385200'},
 {'end': '493200', 'interval': '0', 'start': '471600'},
 {'end': '579600', 'interval': '0', 'start': '558000'}]
-t gmt+1 -n GB_EN -p '39600-61200,0; -(126000-147600,3); -(212400-234000,5); 298800; -(320400); 385200-406800,0; 471600-493200,0; 558000-579600,0'

Python source code:

from pypeg2 import *

from pprint import pprint

Timezone = re.compile(r"(?i)gmt[\+\-]\d")
TimeValue = re.compile(r"[\d]+")

class ObjectSerializerMixin(object):

    def get_as_object(self):
        obj = {}

        for attr in ['start', 'end', 'interval', 'inverted']:
            if getattr(self, attr, None):
                obj[attr] = getattr(self, attr)

        return obj

class TimeFixed(str, ObjectSerializerMixin):
    grammar = attr('start', TimeValue)

class TimePeriod(Namespace, ObjectSerializerMixin):
    grammar = attr('start', TimeValue), '-', attr('end', TimeValue), ',', attr('interval', TimeValue)

class TimePeriodWrapped(Namespace, ObjectSerializerMixin):
    grammar = flag("inverted", '-'), "(", attr('start', TimeValue), '-', attr('end', TimeValue), ',', attr('interval', TimeValue), ")"

class TimeFixedWrapped(Namespace, ObjectSerializerMixin):
    grammar = flag("inverted", '-'), "(", attr('start', TimeValue), ")"


class TimeList(List):
    grammar = csl([TimePeriod, TimeFixed, TimePeriodWrapped, TimeFixedWrapped], separator=";")

    def __str__(self):
        for a in self:
            print(a.get_as_object())
        return ''

class AlertExpression(List):
    grammar = '-t', blank, attr('timezone', Timezone), blank, '-n', blank, attr('locale'), blank, "-p", optional(blank),  "'", attr('timelist', TimeList), "'"

    def get_time_objects(self):
        for item in self.timelist:
            yield item.get_as_object()

    def __str__(self):
        return '{} {}'.format(self.timezone, self.locale)


if __name__ == '__main__':

    s="""-t gmt+1 -n GB_EN -p '39600-61200,0; (126000-147600,3); -(212400-234000,5); 298800; (320400); 385200-406800,0; 471600-493200,0; 558000-579600,0'"""

    p = parse(s, AlertExpression)

    print("*"*80)
    print(s)
    print("*"*80)
    print(p)
    print("*"*80)
    pprint(list(p.get_time_objects()))

    print(compose(p))
1

There are 1 answers

1
J Richard Snape On BEST ANSWER

I'm pretty sure this is a bug in pypeg2

You can verify this with a simplified version of the pypeg2 example given here but using values similar to the ones you are using:

>>>from pypeg2 import *
>>> class AddNegation:
...     grammar = flag("inverted",'-'), blank, "(1000-5000,3)"
...
>>> t = AddNegation()
>>> t.inverted = False
>>> compose(t)
'- (1000-5000,3)'
>>> t.inverted = True
>>> compose(t)
'- (1000-5000,3)'

This demonstrates with a minimal example that the value of the flag variable (inverted) has no effect on the composition. As you have found for yourself, your parse is working as you want it.

I've had a quick look through the code and this is where the compose is. The module is all written within the one __init__.py file and this function is recursive. As far as I can tell, the problem is that when the flag is False, the - object is still passed into compose (at the bottom level of recursion) as a str type and simply added into the composed string here.

Update Isolated the bug to this line (1406), which unpacks the flag attribute incorrectly and will send the string '-' back to compose() and append it whatever the value of the property, which has type bool.

A partial workaround is to replace that line with text.append(self.compose(thing, g)) similar to the clauses above (so Attribute types are treated the same as they would be ordinrily once they are unpcked from a tuple), but you then hit this bug where optional attributes (flags are just a special case of type Attribute) are not composed properly where they are missing from the object.

As a workaround for that, you could go to line 1350 of the same file and replace

        if grammar.subtype == "Flag":
            if getattr(thing, grammar.name):
                result = self.compose(thing, grammar.thing, attr_of=thing)
            else:
                result = terminal_indent()

with

        if grammar.subtype == "Flag":
            try:
                if getattr(thing, grammar.name):
                    result = self.compose(thing, grammar.thing, attr_of=thing)
                else:
                    result = terminal_indent()
            except AttributeError:
                #if attribute error missing, insert nothing
                result = terminal_indent()

I'm not sure this is a totally robust fix, but its a workaround that will get you going

Output

With those two workarounds / fixes applied to the pypeg2 module file, the output you get from print(compose(p)) is

-t gmt+1 -n GB_EN -p '39600-61200,0; (126000-147600,3); -(212400-234000,5); 298800; (320400); 385200-406800,0; 471600-493200,0; 558000-579600,0'

as desired and you can continue to use the pypeg2 module.