Why isn't two-spaced YAML parsed like four-spaced YAML?

2.3k views Asked by At

I'm seeing strange behavior when parsing YAML (using Ruby 2.5/Psych) created using two space indentations. The same file, indented with four spaces per line works -- to my mind -- as expected.

Two spaces:

windows:
  - shell:
    panes:
      - echo hello

results in the following hash:

{"windows"=>[{"shell"=>nil, "panes"=>["echo hello"]}]}

Whereas using four space indentations:

windows:
    - shell:
        panes:
            - echo hello

results in:

{"windows"=>[{"shell"=>{"panes"=>["echo hello"]}}]}

I just skimmed through the spec and didn't see anything relevant to this issue.

Is this behavior expected? If so, I'd greatly appreciate links to resources explaining why.

2

There are 2 answers

0
Wayne Conrad On BEST ANSWER

The trouble is that you cannot simply replace every two spaces with four spaces. That is because in this pair of lines:

  - shell:
    panes:

these two spaces in the second line:

    panes:
  ^^

Are an abbrevation for the "- " in the line above. If the second line were not abbreviated, then the pair of lines would be:

  - shell:
  - panes:

So when doubling the indentation, the second of these line should only have its first pair of spaces doubled, not the second. That would yield the correct indentation for the pair:

    - shell:
      panes:

So, if you only expand the first pair of spaces in the "panes:" line, you get:

windows:
    - shell:
      panes:
          - git status

Which correctly parses to the expected result.

1
flyx On

While Wayne's solution is correct, the explanation seems a bit off, so I'll throw in mine:

In YAML, the - for block sequence items (like ? and : for block mappings) is treated as indentation (spec):

The “-”, “?” and “:” characters used to denote block collection entries are perceived by people to be part of the indentation. This is handled on a case-by-case basis by the relevant productions.

Moreover, all block collections (sequences and mappings) take their indentation from their first item (since there is no explicit starting indicator). So in the line - shell:, the - defines the indentation level of the newly started sequence, while at the same time, shell: defines the indentation level of the newly started mapping, which is the content of the sequence item. Note how the - is treated as indentation for defining the indentation level of the mapping.

Now, revisiting your first example:

windows:
  - shell:
    panes:
      - echo hello

panes: is on the same level as shell:. This means that YAML parses it as key of the mapping started by shell:, meaning that the key shell has an empty value. Mapping values of implicit keys, if not on the same line, must always be indented more than the corresponding mapping key (spec):

The block node’s properties may span across several lines. In this case, they must be indented by at least one more space than the block collection, regardless of the indentation of the block collection entries.

OTOH, in the second example:

windows:
    - shell:
        panes:
            - echo hello

panes: is on a deeper indentation level compared to shell:. This means that it is parsed as value of the key shell and thus starts a new, nested block mapping.

Finally, mind that since - is treated as part of the indentation, „indenting by two spaces“ could also mean this:

windows:
- shell:
    panes:
    - echo hello

Note how the - are not more indented than their mapping keys. This works because the spec says:

Since people perceive the “-” indicator as indentation, nested block sequences may be indented by one less space to compensate, except, of course, if nested inside another block sequence (block-out context vs. block-in context).