Creating a yaml file with aliases through PyYAML

6.8k views Asked by At

I need to create a yaml file with the following format:

imager: &imager
  type: "detector"
  half_angle: 75 degrees
  max_distance: 23000 meters
ownship: &ownship
  origin: [11,11,5]
  type: "uav"

vehicles:
  - <<: *ownship
  name: "uav1"
  origin: [35.69257148103399 degrees, -117.689417544709 degrees, 5500]
  sensors:
    - <<: *imager
      name: "imager1"

I have all the specific data stored in Python classes, so I figured I'd use PyYAML to make things easy. However, when I went to read the documentation, I saw no mention of how to handle aliases with PyYAML. Does this functionality exist, or should I just go ahead and make my own yaml writer?

2

There are 2 answers

2
Anthon On BEST ANSWER

First of all the file you specify is not correct YAML. It will not read in because here:

- <<: *ownship
name: "uav1"

you juxtapose a sequential element and a mapping element and that is not allowed. If you remove the - from the first line you get a correct YAML file:

imager: &imager
  type: "detector"
  half_angle: 75 degrees
  max_distance: 23000 meters
ownship: &ownship
  origin: [11,11,5]
  type: "uav"

vehicles:
  - <<: *ownship
  name: "uav1"
  origin: [35.69257148103399 degrees, -117.689417544709 degrees, 5500]
  sensors:
    - <<: *imager
      name: "imager1"

that you cannot generate with PyYAML.

PyYAML does support anchors and references for reading and writing. And it does support the merge key operator << for reading. But it does not support writing of the merge operator.

That would require comparing different dictionaries, determining if any dict is a complete subset (all keys and values of the one being in the other) of another dict and then making an anchor on the subset and adding the merge operator on writing the other dict (and not writing the keys in the subset). There is no such code in PyYAML to do so, as it is much more complicated than using anchors and references on shared complex objects (dict, list etc) which PyYAML does support.

My ruamel.yaml, which has a superset of PyYAML functionality, does support round-tripping of such data starting with version 0.10. It does some "normalisation" on the first round-trip:

imager: &imager
  type: detector
  half_angle: 75 degrees
  max_distance: 23000 meters
ownship: &ownship
  origin: [11, 11, 5]
  type: uav
vehicles:
  <<: *ownship
  name: uav1
  origin: [35.69257148103399 degrees, -117.689417544709 degrees, 5500]
  sensors:
  - <<: *imager
    name: imager1

It is easy to just read in that YAML and manipulate the resulting data-structure and then write it out. Assignment of keys is done on the dictionary you reference, retrieval of a value is transparently done from the first merged dictionary if not available in the referenced dictionary.

Creating such a structure from scratch and then dumping it is more difficult, as there are no supporting routines to create merges by comparing keys/values between dictionaries (yet).

2
larsks On

It looks as if PyYAML does the right thing if your python data structure contains multiple references to the same object. Consider, for example, this:

>>> a = {'name': 'bob', 'office': '100'}
>>> b = {'president': a, 'vice-president': a}
>>> b
{'president': {'name': 'bob', 'office': '100'}, 'vice-president': {'name': 'bob', 'office': '100'}}
>>> import yaml
>>> print yaml.dump(b)
president: &id001 {name: bob, office: '100'}
vice-president: *id001

PyYAML has recognized that the values for both the 'president' and 'vice-president' keys are references to the same object, and has created an alias and used it appropriately.