Convert YAML multi-line values to folded block scalar style?

3.5k views Asked by At

Using ruamel.yaml I tried to get a YAML in a certain style, more specifically one where single-line strings start at same line as : and multi-line strings using a folded scalar style (|/|-) and lines being limited to a certain amount of characters (word-wrapped).

My attempt so far heavily influenced by a similar function called walk_tree in the sources:

#!/usr/bin/env python

import ruamel.yaml

from ruamel.yaml.scalarstring import ScalarString, PreservedScalarString

def walk_tree(base):
    from ruamel.yaml.compat import string_types

    if isinstance(base, dict):
        for k in base:
            v = base[k]
            if isinstance(v, string_types):
                v = v.replace('\r\n', '\n').replace('\r', '\n').strip()
                base[k] = ScalarString(v) if '\n' in v else v
            else:
                walk_tree(v)
    elif isinstance(base, list):
        for idx, elem in enumerate(base):
            if isinstance(elem, string_types) and '\n' in elem:
                print(elem) # @Anthon: this print is in the original code as well
                base[idx] = preserve_literal(elem)
            else:
                walk_tree(elem)

with open("input.yaml", "r") as fi:
    inp = fi.read()

loader=ruamel.yaml.RoundTripLoader
data = ruamel.yaml.load(inp, loader)

walk_tree(data)

dumper = ruamel.yaml.RoundTripDumper

with open("output.yaml", "w") as fo:
    ruamel.yaml.dump(data, fo, Dumper=dumper, allow_unicode=True)

But then I get an exception: ruamel.yaml.representer.RepresenterError: cannot represent an object: …. I get no exception if I replace ScalarString with PreservedScalarString as is the case in the original walk_tree code but then I get the literal blocks again which is not what I want.

So how can my code be fixed so that it will work?

1

There are 1 answers

0
Anthon On BEST ANSWER

The class ScalarString is a base class for LiteralScalarString, it has no representer as you found out. You should just make/keep this a Python string, as that deals with special characters appropriately (quoting strings that need to be quoted to conform to the YAML specification).

Assuming you have input like this:

- 1
- abc: |
    this is a short string scalar with a newline
    in it
- "there are also a multiline\nsequence element\nin this file\nand it is longer"

You probably want to do something like:

import ruamel.yaml
from ruamel.yaml.scalarstring import LiteralScalarString, preserve_literal


def walk_tree(base):
    from ruamel.yaml.compat import string_types

    def test_wrap(v):
        v = v.replace('\r\n', '\n').replace('\r', '\n').strip()
        return v if len(v) < 72 else preserve_literal(v)

    if isinstance(base, dict):
        for k in base:
            v = base[k]
            if isinstance(v, string_types) and '\n' in v:
                base[k] = test_wrap(v)
            else:
                walk_tree(v)
    elif isinstance(base, list):
        for idx, elem in enumerate(base):
            if isinstance(elem, string_types) and '\n' in elem:
                base[idx] = test_wrap(elem)
            else:
                walk_tree(elem)

yaml = YAML()

with open("input.yaml", "r") as fi:
    data = yaml.load(fi)

walk_tree(data)

with open("output.yaml", "w") as fo:
    yaml.dump(data, fo)

to get output:

- 1
- abc: "this is a short string scalar with a newline\nin it"
- |-
  there are also a multiline
  sequence element
  in this file
  and it is longer

Some notes:

  • Use of LiteralScalarString is preferred over PreservedScalarString. The latter name a remnant from the time it was the only preserved string type.
  • you probably had no sequence elements that where strings, as you did not import preserve_literal, although it was still used in the copied code.
  • I factored out the "wrapping" code into test_wrap, used by both value and element wrapping, the max line length for that was set at 72 characters.
  • the value data[1]['abc'] loads as LiteralScalarString. If you want to preserve existing literal style string scalars, you should test for those before testing on type string_types.
  • I used the new API with an instance of YAML()
  • You might have to set the width attribute to something like 1000, to prevent automatic line wrapping, if you increase 72 in the example to above the default of 80. (yaml.width = 1000)