How to know which pyparsing ParserElement created which ParseResults?

45 views Asked by At

The problem

I have a bunch of Group-ed expressions like these, all of which contribute to the same grammar:

first_and_operator = Group(Char('ab')('first') + Char('+-')('operator'))
full_expression = Group(first_and_operator + Char('cd')('second'))
full_expression.parse_string('a + d', parse_all = True)

This, for instance, outputs the following results:

ParseResults([
  ParseResults([
    # How to know that this is first_and_operator's results?
    ParseResults(['a', '+'], {'first': 'a', 'operator': '+'}),  
    'd'
  ], {'second': 'd'})
], {})

The innermost have the following attributes, none of which helps figuring out the original expression (these are also undocumented; there's not one mention of modal or all_names in this whole page):

[
  attr_value for attr in dir(results)
  if not attr.startswith('__')
  and not callable(attr_value := getattr(results, attr))
]
[
  ('_all_names', set()), ('_modal', True), ('_name', 'first'),
  ('_null_values', (None, [], ())), ('_parent', None),
  ('_tokdict', {...: ...}), ('_toklist', ['a', '+']),
  ('first', 'a'), ('operator', '+')
]

If I'm passed an arbitrary ParseResults object, how do I know which expression created it?

Things to consider and what I tried

As you may have noticed, the nested ParseResults object is one level deeper than needed, since full_expression is also a Group. I have an Expand class that does the exact opposite (code omitted here for brevity) which may be used in a parent expression to flatten the results.

This example is simplified a lot, but in reality the equivalent of 'c' and such are also (possibly nested) ParseResults objects. To handle each of these elements correctly, I need to know the type of the expression that generated it in the first place.

expression_list = Expand(full_expression) + (',' + Expand(full_expression))[1, ...]
expression_list.parse_string('a + c, a - d, b + d', parse_all = True)
ParseResults([
  ParseResults(['a', '+'], {'first': 'a', 'operator': '+'}),
  'c', ',',
  ParseResults(['a', '-'], {'first': 'a', 'operator': '-'}),
  'd', ',',
  ParseResults(['b', '+'], {'first': 'b', 'operator': '+'}),
  'd'
], {})

Expand is a generic subclass of TokenConverter. This means the object passed to Expand.postParse() must be an actual ParseResults, or at least have the same interface of one (iterability, subscriptability, methods, etc.).

I have thought about using .add_parse_action() to convert that ParseResults into a subclass instance like this, but I am not even sure that this is the intended arguments. The parameters are also undocumented, not type hinted and the code is confusing to read.

class TypedParseResults(ParseResults):
  __slots__ = ('type',)

  def __init__(self, *args, result_type: str, **kwargs):
    self.type = result_type
    super().__init__(*args, **kwargs)

@first_and_operator.add_parse_action
def make_results_typed(results: ParseResults):
  return TypedParseResults(
    self._toklist,
    result_type = 'first_and_operator'
  )

This only ever raises an exception from somewhere deep down in core.py:

Traceback (most recent call last):
  File ".../main.py", line 85, in <module>
    r = full_expression.parse_string('a + d', parse_all = True)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  ...
  File ".../site-packages/pyparsing/core.py", line 895, in _parseNoCache
    ret_tokens = ParseResults(
                 ^^^^^^^^^^^^^
TypeError: TypedParseResults.__init__() missing 1 required keyword-only argument: 'result_type'

This question, while sounds similar, does not answer mine since it is about debugging and not runtime checking. Also, none of the answers at this question works for me, for various reasons:

  • The first is incompatible with Expand.
  • The second is ugly. Even more so with tens of expressions. In addition, it has the same problem with the third, which is explained below.

Some context

This grammar is the origin of two parsers, one CST and one AST; the initial results need to be in some interchangeable format that can then easily be converted to both, or so I think. I would like to make the two parsers independent (no CST to AST conversion) if that is at all possible.

Each expression is a Group, and needs to work on its own, which means .set_name() (only stored at the expression, not the results) and .set_result_name() (might not be the same across parent expressions) doesn't work.

Conclusion

This is particularly hard to do that I wonder if I took the wrong way. If there are better solutions to modify my existing code around, I'm willing to hear.

0

There are 0 answers