Preserve anchor names when dumping class with ruamel.yaml and filtering through __getstate__ and __setstate__

323 views Asked by At

I have classes that are setup to be loaded/dumped using ruamel.yaml 0.17.21 yaml_object decorator.

For some reason, I don't want changes in my class to be saved to yaml, unless specifically requested by the user. To do that, I use the __getstate__ and __setstate__ methods to cache the initial_state and the user is directed to modify this object if they want a permanent change.

Everything works as excepted, except that the name of the anchors are lost when dumping. I was not able to pinpoint how ruamel.yaml preserves those normally when the __{set|get}state__ methods are not defined.

Those easy-to-read anchor names are very important to me as a user-friendly feature.

Here's an example:

import sys
from copy import deepcopy
from ruamel.yaml import YAML, yaml_object

yaml = YAML()

@yaml_object(yaml)
class ExampleClass():

    def __init__(self):
        self._permanent_state = {}

    def __setstate__(self, state):
        self.__dict__.update(deepcopy(state))
        self._permanent_state = state

    def __getstate__(self):
        return self._permanent_state
        

source = """
simple: &simple !ExampleClass
    potato: 10
    turnip: 20

nested: !ExampleClass
    sunflower: 30
    others: *simple
"""
a = yaml.load(source)
a['simple'].potato = 10**6 # Should not be reflected in dump
a['nested']._permanent_state['sunflower'] = 100 # Should be reflected in dump
yaml.dump(a, sys.stdout)

Output has simple anchor renamed to id001:

simple: &id001 !ExampleClass
  potato: 10
  turnip: 20
nested: !ExampleClass
  sunflower: 100
  others: *id001

If I remove the __getstate__ and __setstate__ methods

import sys
from copy import deepcopy
from ruamel.yaml import YAML, yaml_object

yaml = YAML()

@yaml_object(yaml)
class ExampleClass():
    pass

source = """
simple: &simple !ExampleClass
    potato: 10
    turnip: 20

nested: !ExampleClass
    sunflower: 30
    others: *simple
"""
a = yaml.load(source)
a['simple'].potato = 10**6   # Should not be reflected in dump
a['nested']._permanent_state = deepcopy(a['nested'].__dict__)
a['nested']._permanent_state['sunflower'] = 100  # Should not be reflected in dump
yaml.dump(a, sys.stdout)

Output now preserves the simple anchor (but the desired behaviour is lost):

simple: &simple !ExampleClass
  potato: 1000000
  turnip: 20
nested: !ExampleClass
  sunflower: 30
  others: *simple
  _permanent_state:
    sunflower: 100
    others: !ExampleClass
      potato: 1000000
      turnip: 20

Ideally I would like the behaviour of the first example, while preserving the anchor name.

1

There are 1 answers

2
Anthon On

Had you not provided a counterexample, I would have guessed the anchor would not be preserved on a instance of any registered class.

The presence of __setstate__ forces a different execution path in the constructor in which the anchor information is not actively dropped, but just not used. It is probably possible to subclass (and use) the RoundTripConstructor and RoundTripRepresenter to store/retrieve the anchor information available in the node. In the same way you will lose comments that are "part of" your tagged mapping.

It is probably easiest to load the YAML document without registering ExampleClass and then recursively walking over the loaded data, looking for "dict" (subclass) instances that have the .tag attribute (actually a property) set to !ExcampleClass and replace those with an appropriately behaving instance ( or possible adding the specific behaviour you want, using duck-typing).