using construct_undefined in ruamel from_yaml

329 views Asked by At

I'm creating a custom yaml tag MyTag. It can contain any given valid yaml - map, scalar, anchor, sequence etc.

How do I implement class MyTag to model this tag so that ruamel parses the contents of a !mytag in exactly the same way as it would parse any given yaml? The MyTag instance just stores whatever the parsed result of the yaml contents is.

The following code works, and the asserts should should demonstrate exactly what it should do and they all pass.

But I'm not sure if it's working for the right reasons. . . Specifically in the from_yaml class method, is using commented_obj = constructor.construct_undefined(node) a recommended way of achieving this, and is consuming 1 and only 1 from the yielded generator correct? It's not just working by accident?

Should I instead be using something like construct_object, or construct_map or. . .? The examples I've been able to find tend to know what type it is constructing, so would either use construct_map or construct_sequence to pick which type of object to construct. In this case I effectively want to piggy-back of the usual/standard ruamel parsing for whatever unknown type there might be in there, and just store it in its own type.

import ruamel.yaml
from ruamel.yaml.comments import CommentedMap, CommentedSeq, TaggedScalar


class MyTag():
    yaml_tag = '!mytag'

    def __init__(self, value):
        self.value = value

    @classmethod
    def from_yaml(cls, constructor, node):
        commented_obj = constructor.construct_undefined(node)
        flag = False
        for data in commented_obj:
            if flag:
                raise AssertionError('should only be 1 thing in generator??')
            flag = True

        return cls(data)


with open('mytag-sample.yaml') as yaml_file:
    yaml_parser = ruamel.yaml.YAML()
    yaml_parser.register_class(MyTag)
    yaml = yaml_parser.load(yaml_file)

custom_tag_with_list = yaml['root'][0]['arb']['k2']
assert type(custom_tag_with_list) is MyTag
assert type(custom_tag_with_list.value) is CommentedSeq
print(custom_tag_with_list.value)

standard_list = yaml['root'][0]['arb']['k3']
assert type(standard_list) is CommentedSeq
assert standard_list == custom_tag_with_list.value

custom_tag_with_map = yaml['root'][1]['arb']
assert type(custom_tag_with_map) is MyTag
assert type(custom_tag_with_map.value) is CommentedMap
print(custom_tag_with_map.value)

standard_map = yaml['root'][1]['arb_no_tag']
assert type(standard_map) is CommentedMap
assert standard_map == custom_tag_with_map.value

custom_tag_scalar = yaml['root'][2]
assert type(custom_tag_scalar) is MyTag
assert type(custom_tag_scalar.value) is TaggedScalar

standard_tag_scalar = yaml['root'][3]
assert type(standard_tag_scalar) is str
assert standard_tag_scalar == str(custom_tag_scalar.value)

And some sample yaml:

root:
  - item: blah
    arb:
      k1: v1
      k2: !mytag
        - one
        - two
        - three-k1: three-v1
          three-k2: three-v2
          three-k3: 123 # arb comment
          three-k4: 
            - a
            - b
            - True
      k3:
        - one
        - two
        - three-k1: three-v1
          three-k2: three-v2
          three-k3: 123 # arb comment
          three-k4: 
            - a
            - b
            - True
  - item: argh
    arb: !mytag
            k1: v1
            k2: 123
            # blah line 1
            # blah line 2
            k3:
              k31: v31
              k32: 
                - False
                - string here
                - 321
    arb_no_tag:
      k1: v1
      k2: 123
      # blah line 1
      # blah line 2
      k3:
        k31: v31
        k32: 
          - False
          - string here
          - 321
  - !mytag plain scalar
  - plain scalar
  - item: no comment
    arb:
      - one1
      - two2
1

There are 1 answers

2
Anthon On BEST ANSWER

In YAML you can have anchors and aliases, and it is perfectly fine to have an object be a child of itself (using an alias). If you want to dump the Python data structure data:

data = [1, 2, 4, dict(a=42)]
data[3]['b'] = data

it dumps to:

&id001
- 1
- 2
- 4
- a: 42
  b: *id001

and for that anchors and aliases are necessary.

When loading such a construct, ruamel.yaml recurses into the nested data structures, but if the toplevel node has not caused a real object to be constructed to which the anchor can be made a reference, the recursive leaf cannot resolve the alias.

To solve that, a generator is used, except for scalar values. It first creates an empty object, then recurses and updates it values. In code calling the constructor a check is made to see if a generator is returned, and in that case next() is done on the data, and potential self-recursion "resolved".

Because you call construct_undefined(), you always get a generator. Practically that method could return a value if it detects a scalar node (which of course cannot recurse), but it doesn't. If it would, your code could then not load the following YAML document:

!mytag 1

without modifications that test if you get a generator or not, as is done in the code in ruamel.yaml calling the various constructors so it can handle both construct_undefined and e.g. construct_yaml_int (which is not a generator).