I'm creating a custom yaml tag MyTag. It can contain any given valid yaml - map, scalar, anchor, sequence etc.
How do I implement class MyTag to model this tag so that ruamel parses the contents of a !mytag
in exactly the same way as it would parse any given yaml? The MyTag
instance just stores whatever the parsed result of the yaml contents is.
The following code works, and the asserts should should demonstrate exactly what it should do and they all pass.
But I'm not sure if it's working for the right reasons. . . Specifically in the from_yaml
class method, is using commented_obj = constructor.construct_undefined(node)
a recommended way of achieving this, and is consuming 1 and only 1 from the yielded generator correct? It's not just working by accident?
Should I instead be using something like construct_object
, or construct_map
or. . .? The examples I've been able to find tend to know what type it is constructing, so would either use construct_map
or construct_sequence
to pick which type of object to construct. In this case I effectively want to piggy-back of the usual/standard ruamel parsing for whatever unknown type there might be in there, and just store it in its own type.
import ruamel.yaml
from ruamel.yaml.comments import CommentedMap, CommentedSeq, TaggedScalar
class MyTag():
yaml_tag = '!mytag'
def __init__(self, value):
self.value = value
@classmethod
def from_yaml(cls, constructor, node):
commented_obj = constructor.construct_undefined(node)
flag = False
for data in commented_obj:
if flag:
raise AssertionError('should only be 1 thing in generator??')
flag = True
return cls(data)
with open('mytag-sample.yaml') as yaml_file:
yaml_parser = ruamel.yaml.YAML()
yaml_parser.register_class(MyTag)
yaml = yaml_parser.load(yaml_file)
custom_tag_with_list = yaml['root'][0]['arb']['k2']
assert type(custom_tag_with_list) is MyTag
assert type(custom_tag_with_list.value) is CommentedSeq
print(custom_tag_with_list.value)
standard_list = yaml['root'][0]['arb']['k3']
assert type(standard_list) is CommentedSeq
assert standard_list == custom_tag_with_list.value
custom_tag_with_map = yaml['root'][1]['arb']
assert type(custom_tag_with_map) is MyTag
assert type(custom_tag_with_map.value) is CommentedMap
print(custom_tag_with_map.value)
standard_map = yaml['root'][1]['arb_no_tag']
assert type(standard_map) is CommentedMap
assert standard_map == custom_tag_with_map.value
custom_tag_scalar = yaml['root'][2]
assert type(custom_tag_scalar) is MyTag
assert type(custom_tag_scalar.value) is TaggedScalar
standard_tag_scalar = yaml['root'][3]
assert type(standard_tag_scalar) is str
assert standard_tag_scalar == str(custom_tag_scalar.value)
And some sample yaml:
root:
- item: blah
arb:
k1: v1
k2: !mytag
- one
- two
- three-k1: three-v1
three-k2: three-v2
three-k3: 123 # arb comment
three-k4:
- a
- b
- True
k3:
- one
- two
- three-k1: three-v1
three-k2: three-v2
three-k3: 123 # arb comment
three-k4:
- a
- b
- True
- item: argh
arb: !mytag
k1: v1
k2: 123
# blah line 1
# blah line 2
k3:
k31: v31
k32:
- False
- string here
- 321
arb_no_tag:
k1: v1
k2: 123
# blah line 1
# blah line 2
k3:
k31: v31
k32:
- False
- string here
- 321
- !mytag plain scalar
- plain scalar
- item: no comment
arb:
- one1
- two2
In YAML you can have anchors and aliases, and it is perfectly fine to have an object be a child of itself (using an alias). If you want to dump the Python data structure
data
:it dumps to:
and for that anchors and aliases are necessary.
When loading such a construct, ruamel.yaml recurses into the nested data structures, but if the toplevel node has not caused a real object to be constructed to which the anchor can be made a reference, the recursive leaf cannot resolve the alias.
To solve that, a generator is used, except for scalar values. It first creates an empty object, then recurses and updates it values. In code calling the constructor a check is made to see if a generator is returned, and in that case
next()
is done on the data, and potential self-recursion "resolved".Because you call
construct_undefined()
, you always get a generator. Practically that method could return a value if it detects a scalar node (which of course cannot recurse), but it doesn't. If it would, your code could then not load the following YAML document:without modifications that test if you get a generator or not, as is done in the code in ruamel.yaml calling the various constructors so it can handle both
construct_undefined
and e.g.construct_yaml_int
(which is not a generator).