How can I add a comment to a YAML file in Python

8.5k views Asked by At

I am writing a YAML file using https://pypi.python.org/pypi/ruamel.yaml

The code is like this:

import ruamel.yaml
from ruamel.yaml.comments import CommentedSeq

d = {}
for m in ['B1', 'B2', 'B3']:
    d2 = {}
    for f in ['A1', 'A2', 'A3']:
        d2[f] = CommentedSeq(['test', 'test2'])
        if f != 'A2':
            d2[f].fa.set_flow_style()
    d[m] = d2

    with open('test.yml', "w") as f:
        ruamel.yaml.dump(
            d, f, Dumper=ruamel.yaml.RoundTripDumper,
            default_flow_style=False, width=50, indent=8)

I just want to add comment at the top like:

# Data for Class A

Before the YAML data.

2

There are 2 answers

0
dimo414 On BEST ANSWER

Within your with block, you can write anything you want to the file. Since you just need a comment at the top, add a call to f.write() before you call ruamel:

with open('test.yml', "w") as f:
    f.write('# Data for Class A\n')
    ruamel.yaml.dump(
        d, f, Dumper=ruamel.yaml.RoundTripDumper,
        default_flow_style=False, width=50, indent=8)
2
Anthon On

That is possible in principle, because you can round-trip such "start-of-file" comments, but it is not nicely supported in the current ruamel.yaml 0.10 and certainly not when "starting from scratch" (i.e. no changing an existing file). At the bottom is an easy an relatively nice solution but I would first like to present an ugly workaround and a step-wise how to get this done.

Ugly:
The ugly way to do this is to just add the comment to the file before you write the YAML data to it. That is insert:

f.write('# Data for Class A\n')

just before ruamel.yaml.dump(...)

Step by step:
To insert the comment on the data structure, so the above hack is not necessary, you first need to make sure your d data is a CommentedMap type. If you compare the difference of that d variable with one that has a the comment by loading the commented YAML back into c

import ruamel.yaml
from ruamel.yaml.comments import Comment, CommentedSeq, CommentedMap

d = CommentedMap()             # <<<<< most important
for m in ['B1', 'B2', 'B3']:
    d2 = {}
    for f in ['A1', 'A2', 'A3']:
        d2[f] = CommentedSeq(['test', 'test2'])
        if f != 'A2':
            d2[f].fa.set_flow_style()
    d[m] = d2

yaml_str = ruamel.yaml.dump(d, Dumper=ruamel.yaml.RoundTripDumper,
                            default_flow_style=False, width=50, indent=8)

assert not hasattr(d, Comment.attrib)  # no attribute on the CommentedMap

comment = 'Data for Class A'
commented_yaml_str = '# ' + comment + '\n' + yaml_str
c = ruamel.yaml.load(commented_yaml_str, Loader=ruamel.yaml.RoundTripLoader)
assert hasattr(c, Comment.attrib)  # c has the attribute
print c.ca                         # and this is what it looks like
print d.ca                         # accessing comment attribute creates it empty
assert hasattr(d, Comment.attrib)  # now the CommentedMap has the attribute

This prints:

Comment(comment=[None, [CommentToken(value=u'# Data for Class A\n')]],
  items={})
Comment(comment=None,
  items={})

A Comment has an attribute comment that needs to be set to a 2 element list that consist of the EOL comment (always only one) and a list of preceding line comments (in the form of CommentTokens)

To create a CommentToken you need a (fake) StartMark that tells which column it starts:

from ruamel.yaml.error import StreamMark
start_mark = StreamMark(None, None, None, 0, None, None)  # column 0

Now you can create the token:

from ruamel.yaml.tokens import CommentToken

ct = CommentToken('# ' + comment + '\n', start_mark, None)

Assign the token as the first element of the preceding list on your CommentedMap:

d.ca.comment = [None, [ct]]
print d.ca   # in case you want to check

gives you:

Comment(comment=[None, [CommentToken(value='# Data for Class A\n')]],
  items={})

And finally:

print ruamel.yaml.dump(d, Dumper=ruamel.yaml.RoundTripDumper)  

gives:

# Data for Class A
B1:
        A1: [test, test2]
        A3: [test, test2]
        A2:
        - test
        - test2
B2:
        A1: [test, test2]
        A3: [test, test2]
        A2:
        - test
        - test2
B3:
        A1: [test, test2]
        A3: [test, test2]
        A2:
        - test
        - test2

Of course you don't need to create the c object, that is just for illustration.

What you should use: To make the whole exercise somewhat easier you can just forget about the details and patch in the following method to CommentedBase once:

from ruamel.yaml.comments import CommentedBase

def set_start_comment(self, comment, indent=0):
    """overwrites any preceding comment lines on an object
    expects comment to be without `#` and possible have mutlple lines
    """
    from ruamel.yaml.error import StreamMark
    from ruamel.yaml.tokens import CommentToken
    if self.ca.comment is None:
        pre_comments = []
        self.ca.comment = [None, pre_comments]
    else:
        pre_comments = self.ca.comments[1]
    if comment[-1] == '\n':
        comment = comment[:-1]  # strip final newline if there
    start_mark = StreamMark(None, None, None, indent, None, None)
    for com in comment.split('\n'):
        pre_comments.append(CommentToken('# ' + com + '\n', start_mark, None))

if not hasattr(CommentedBase, 'set_start_comment'): # in case it is there
    CommentedBase.set_start_comment = set_start_comment

and then just do:

d.set_start_comment('Data for Class A')