Is this a valid Yaml list? How do I make YamlDotNet read it?

128 views Asked by At

I have old C++ code which manually parses YAML and has a large number of unit tests. I am converting it over to C# using YamlDotNet. But one of the features of the old code was that it could handle a bit of variability in the lists. For example, consider the following list...

images:
  - image01.png 
  - image02.png
  - image03.png

The old parsing code could handle the case where there is no space between the '-' character and the item. So it can also read this with no problem.

images:
  - image01.png 
  -image02.png
  - image03.png

Unfortunately YamlDotNet does not parse this. I was converting the unit test that specifically verifies this and YamlDotNet throws a YamlDotnet.Core.SemanticErrorException with this message

While parsing a block collection, did not find expected '-' indicator

But if I then go in and put the space back between the '-' and the "image02.png", it reads the yaml perfectly.

So my questions are

  1. Is this technically even valid YAML? That is, should the list above always cause a fault? If not, I'll just get rid of the unit test.
  2. But if it is valid YAML, is there an option I can pass to the YamlDotNet Deserializer to handle it?
2

There are 2 answers

1
pmf On BEST ANSWER

In YAML,

  • - image01.png (with the space) encodes the string image01.png as an array item, while
  • -image02.png encodes just the string -image02.png (including the dash), and as such invalidates your document due to its misplacement within the surrounding array context.

So, the answer to your questions is: No, it's not valid YAML. See the YAML specs under section 2.1. Collections:

Block sequences indicate each entry with a dash and space (“- ”).

1
Anthon On

What you have there is a sequence in block style YAML, i.e. a block sequence, one of the two block collection styles in YAML (the other being the block mapping). Calling it a list (or array, or whatever it is loaded as, into your programming language of choice) makes it hard to find the actual description of block sequences in the documentation, the first paragraph of which reads:

A block sequence is simply a series of nodes, each denoted by a leading “-” indicator. The “-” indicator must be separated from the node by white space. This allows “-” to be used as the first character in a plain scalar if followed by a non-space character (e.g. “-42”).

The production rules there indicate that the - (called a sequence entry indicator) doesn't have to be followed by a space, but only if the following node is empty:

images:
  - image01.png 
  -
  - image03.png

What also would be valid YAML is:

images:
  - image01.png 
   -image02.png
  - image03.png

Here the sequence that is the value for key images has two elements: the scalar image01.png -image02.png and the scalar image03.png.

Both the latter example and your required input involve inserting one missing space in the input to get correct YAML, so it is difficult for a real YAML parser to give a good suggestion of what is wrong. So I am not surprised the "manual" parsing got this wrong.