Adding comments to yaml after a key

80 views Asked by At

I am using ruamel.yaml version 0.18.5 and python 3.10.4

My problem centres around adding (and removing) comments from yaml files. In particular I want to ensure that the end of every top-level block of data has a newline after it, and the same for every block that conforms to the same schema.

Consider for example the following:

import ruamel.yaml
import sys

from pathlib import Path
from ruamel.yaml import YAML

yaml = YAML()
yaml.preserve_quotes = True
yaml.encoding = True
yaml.indent(mapping=2, sequence=4, offset=2)
yaml.width = 120

yaml_test_data = """
name: My Application
description: >
  A multiline description of what my app does
  
bcp:
  tier: 2
  rto: PT1H
  rpo: PT1H

environment:
  - name: EMEA
    host: [abc*]
  - name: APAC
    host: [def*]

data:
  class: [INT, PUB]
  categories: [REF, MKT]
"""

testdata = yaml.load(yaml_test_data)

Now, I want to do the following:

  1. remove the newline after the "rpo" item
  2. add another item to the bcp block
  3. add a comment after this new item
  4. insert a comment after the first item in the environment block

To test what's possible I've tried to use the yaml_set_comment_before_after_key() function. As the author freely admits, the documentation isn't complete so I'm not sure if this is the right approach.

testdata['bcp'].yaml_set_comment_before_after_key(key='rpo', after="")
testdata['bcp']['new_item'] = "new data"
testdata['bcp'].yaml_set_comment_before_after_key(key='new_item', 
                                                  before="This is a comment before 'new_item' key", 
                                                  after="This is a comment after 'new_item' key",
                                                  )
testdata.yaml_set_comment_before_after_key(key='name', 
                                           before="comment before key 'name'", 
                                           after="comment after key 'name'")
testdata['environment'][0].yaml_set_comment_before_after_key(key='host', 
                                                             after="comment after 1st host")

When I inspect the CommentMap associated with 'bcp' via testdata['bcp'].ca:

I now get this:

Comment(
  start=None,
  items={
    rpo:      [None, None, CommentToken('\n\n', line: 8, col: 7), None]
    new_item: [None, [CommentToken("# This is a comment before 'new_item' key\n", col: 0)], None, [CommentToken("# This is a comment after 'new_item' key\n", col: 2)]]
  })

The comment has not been removed from the 'rpo' item, but it looks like the before and after comments I added to the 'new_item' are in place. However, when dumping it to the screen via yaml.dump(testdata, sys.stdout) the 'after' comments aren't there in either the 'bcp' block or the 'name' field and it also splits the data for the 1st 'host' field:

# comment before key 'name'
name: My Application
description: >
  A multiline description of what my app does

bcp:
  tier: 2
  rto: PT1H
  rpo: PT1H

# This is a comment before 'new_item' key
  new_item: new data
environment:
  - name: EMEA
    host:
  # comment after 1st host
        [abc*]
  - name: APAC
    host: [def*]

data:
  class: [INT, PUB]
  categories: [REF, MKT]

This is a simplified example for the 100s of yaml files I want to update. I want to enforce some better formatting by checking/removing comments where necessary and adding a newline after each (top-level) block of data.

What do I need to do to get the code working?

1

There are 1 answers

2
Anthon On

I first repeat here two things I have written in other answers:

  1. The comment changing routines are subject to change, it is not a public API, so pin the version of ruamel.yaml that you are using.
  2. Try to round-trip the expected output, check the output to be correct, and if so analyse the output.

You don't indicate you did step 2, so lets try that first ( I hope I got the expected output correct ).

import sys
import ruamel.yaml

yaml_str = """\
# comment before key 'name'
name: My Application
# comment after key 'name'
description: >
  A multiline description of what my app does
  
bcp:
  tier: 2
  rto: PT1H
  rpo: PT1H
# This is a comment before 'new_item' key
  new_item: new data
# This is a comment after 'new_item' key
environment:
  - name: EMEA
    host: [abc*]
# comment after 1st host
  - name: APAC
    host: [def*]

data:
  class: [INT, PUB]
  categories: [REF, MKT]
"""
    
yaml = ruamel.yaml.YAML()
yaml.indent(mapping=4, sequence=4, offset=2)
yaml.preserve_quotes = True
yaml.width = 120
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)

which gives:

# comment before key 'name'
name: My Application
# comment after key 'name'
description: >
    A multiline description of what my app does

bcp:
    tier: 2
    rto: PT1H
    rpo: PT1H
# This is a comment before 'new_item' key
    new_item: new data
# This is a comment after 'new_item' key
environment:
  - name: EMEA
    host: [abc*]
# comment after 1st host
  - name: APAC
    host: [def*]

data:
    class: [INT, PUB]
    categories: [REF, MKT]

And since that looks like the input, it should be possible to construct that from the data loaded to your input.

You indicate you analysed both the input and the data updated with your changes, but you probably didn't look at the data loaded from the expected output, so lets do that, but keep in mind that accessing the .ca attribute on a loaded collection node (sequence, mapping), will create an (empty) comment if there previously was no comment:

print('1:', data.ca)
print('2:', data['bcp'].ca)
print('3:', data['environment'][0]['host'].ca)

which gives:

1: Comment(comment=[None, [CommentToken("# comment before key 'name'\n", line: 0, col: 0)]],
  items={'name': [None, None, CommentToken("\n# comment after key 'name'\n", line: 2, col: 0), None], 'description': [None, None, CommentToken('\n', line: 6, col: 0), None], 'environment': [None, None, None, None]})
2: Comment(comment=None,
  items={'rpo': [None, None, CommentToken("\n# This is a comment before 'new_item' key\n", line: 10, col: 0), None], 'new_item': [None, None, CommentToken("\n# This is a comment after 'new_item' key\n", line: 12, col: 0), None]})
3: Comment(comment=None,
  items={})

Lets start with the last one (3:). As you can see there is no comment string there. The reason for this is that ruamel.yaml tends to gather a comment and attach it to the next node it fully parses, in this case that is the second item (index 1) of the sequence that is the value for key environment:

print('4:', data['environment'].ca)

which gives:

4: Comment(comment=[None, None],
  items={1: [None, [CommentToken('# comment after 1st host\n', line: 16, col: 0)], None, None]},
  end=[CommentToken('\n', line: 19, col: 0)])

So you should attach your comment to data['environment'], but you probably also can attach it to the list that is the value for host. That your code attaches the comment between the key host, and its value is because of the value not being a scalar.

You cannot assign an empty comment as that will give you an empty line. Instead delete the comment entry altogether.

Starting with your input:

from ruamel.yaml.tokens import CommentToken
from ruamel.yaml.error import CommentMark

yaml_str = """\
name: My Application
description: >
  A multiline description of what my app does
  
bcp:
  tier: 2
  rto: PT1H
  rpo: PT1H

environment:
  - name: EMEA
    host: [abc*]
  - name: APAC
    host: [def*]

data:
  class: [INT, PUB]
  categories: [REF, MKT]
"""
    
yaml = ruamel.yaml.YAML()
yaml.indent(mapping=4, sequence=4, offset=2)
yaml.preserve_quotes = True
yaml.width = 120
data = yaml.load(yaml_str)

del data['bcp'].ca.items['rpo']
data['bcp']['new_item'] = "new data"
if False:
    # before and after key, after leverages fact that list is created with before key
    data['bcp'].yaml_set_comment_before_after_key(
        key='new_item', 
        before="This is a comment before 'new_item' key",
    )
    data['bcp'].ca.items['new_item'][2] = CommentToken("\n# This is a comment after 'new_item' key", CommentMark(0))
else:
    # after key only, not that items[2] is not a list
    data['bcp'].ca.items['new_item'] = [None, None, CommentToken("\n# This is a comment after 'new_item' key", CommentMark(0)), None]
# the roundtrip showed the comment on items[1] 2nd position, here we create items[0] 3rd position
# note the list and the final newline in the comment string
data['environment'].ca.items[0] = [None, None, [CommentToken("# comment after 1st host\n", CommentMark(0))], None]
yaml.dump(data, sys.stdout)

which gives:

name: My Application
description: >
    A multiline description of what my app does

bcp:
    tier: 2
    rto: PT1H
    rpo: PT1H
    new_item: new data
# This is a comment after 'new_item' key
environment:
  - name: EMEA
    host: [abc*]
# comment after 1st host
  - name: APAC
    host: [def*]

data:
    class: [INT, PUB]
    categories: [REF, MKT]

I suggest you make some wrapper routines, so that when the internals change, you "only" have to update those.

Other things to consider:

  • I am not sure why you set yaml.encoding to True, it defaults to uft-8 and you should probably leave it at that.
  • the line import ruamel.yaml is superfluous, since you only use YAML from from ruamel.yaml.import YAML