I am using ruamel.yaml to parse a complex YAML document where certain tagged nodes require special treatment. I inject my custom parsing logic using add_multi_constructor
, as recommended by the published examples. The problem is that I need to change the injected logic dynamically depending on external states but the decoration methods like add_multi_constructor
modify the global state which introduces unacceptable coupling between logically unrelated instances. Here is the MWE:
import ruamel.yaml
def get_loader(parameter):
def construct_node(constructor: ruamel.yaml.Constructor, tag: str, node: ruamel.yaml.Node):
return parameter(tag.lstrip("!"), str(node.value))
loader = ruamel.yaml.YAML()
loader.constructor.add_multi_constructor("", construct_node)
return loader
foo = get_loader(lambda tag, node: f"foo: {tag}, {node}")
bar = get_loader(lambda tag, node: f"bar: {tag}, {node}")
print(foo.load("!abc 123"), bar.load("!xyz 456"), sep="\n")
Output:
bar: abc, 123
bar: xyz, 456
Expected:
foo: abc, 123
bar: xyz, 456
I made the following workaround where I create new class instances dynamically to break the coupling:
def get_loader(parameter):
def construct_node(constructor: ruamel.yaml.Constructor, tag: str, node: ruamel.yaml.Node):
return parameter(tag.lstrip("!"), str(node.value))
# Create a new class to prevent state sharing through class attributes.
class ConstructorWrapper(ruamel.yaml.constructor.RoundTripConstructor):
pass
loader = ruamel.yaml.YAML()
loader.Constructor = ConstructorWrapper
loader.constructor.add_multi_constructor("", construct_node)
return loader
My questions are:
Am I misusing the library? The global effects are a huge red flag which suggests that I am using the API incorrectly, but the library lacks any API documentation so I am not sure what would be the correct way.
Is it safe in the sense of API breakage? Since there is no documented API for this, I am not sure if this is safe to put into production.
IMO you are not misusing the library, just working around its current shortcomings/incompleteness.
Before
ruamel.yaml
got the API with theYAML()
instance, it had the function based API of PyYAML with a few extensions, and other PyYAML's problems had to be worked around in a similar unnatural way. E.g. I reverted to having classes whose instances could be called (using__call__()
) on which methods could then be changed to just have access to YAML documents version parsed from a document (as ruamel.yaml supports YAML 1.2 and 1.1 and PyYAML only (partially) supports 1.1).But underneath ruamel.yaml's
YAML()
instance not all has improved. The code inherited from PyYAML stores the information for the various constructors in the class attributes as lookup tables (onyaml_constructor
respyaml_multi_constructor
), and ruamel.yaml still does that (as the full old PyYAML-escque API is effectively still there, and only with version 0.17 has gotten a future deprecation warning).Your approach is in so far interesting in that you do:
instead of:
(you probably know that
loader.constructor
is a property that instantiatesloader.Constructor
if necessary, but other readers of this answer might not)or even:
That your code works, is because constructors are stored on the class attribute as
.add_multi_constructor()
is a class method.So what you do is not entirely safe in the sense of API breakage. ruamel.yaml is not at version 1.0 yet, and (API) changes that potentially break your code could come with any minor version number change. You should set your version dependencies accordingly for your production code (e.g.
ruamel.yaml<0.18
), and update that minor number only after testing with a ruamel.yaml version with a new minor version number.It is possible to transparently change the use of the class attributes by updating the classmethods
add_constructor()
andadd_multi_constructor()
to "normal" methods and have the initialisation of the lookup tables done in__init__()
. Both your examples that call the instance:will get the desired result, but ruamel.yaml's behaviour would not change when calling
add_multi_constructor
on the class using:However changing classmethods
add_constructor()
andadd_multi_constructor()
in this way affects all code out there, that happens to provide an instance instead of the class (and said code being fine with the result).It is more likely that two new instance methods will be added either to the
Constructor
class and to theYAML()
instance , and that the class method will be either phased out or changed to check on a class and not an instance being passed in, after a deprecation period with warnings (as will the global functionsadd_constructor()
andadd_multi_constructor()
inherited from PyYAML).The main advice, apart from having your production code fixed on the minor version number, is to make sure your testing code displays
PendingDeprecationWarning
. If you are usingpytest
this is the case by default. That should give you ample time to adapt your code to what the warning recommends.And if ruamel.yaml's author stops being lazy, he might provide some documentation for such API additions/changes.
which gives: