I want to use Python to read and write YAML frontmatter in markdown files. I have come across the ruamel.yaml package but am having trouble understanding how to use it for this purpose.
If I have a markdown file:
---
car:
make: Toyota
model: Camry
---
# My Ultimate Car Review
This is a good car.
For one, is there a way to set the yaml data to variables in my python code?
Second, is there a way to set new values to the yaml in the markdown file?
For the first, I have tried:
from ruamel.yaml import YAML
import sys
f = open("cars.txt", "r+") # I'm really not sure if r+ is ideal here.
yaml = YAML()
code = yaml.load(f)
print(code['car']['make'])
but get an error:
ruamel.yaml.composer.ComposerError: expected a single document in the stream
in "cars.txt", line 2, column 1
but found another document
in "cars.txt", line 5, column 1
For the second, I have tried:
from ruamel.yaml import YAML
import sys
f = open("cars.txt", "r+") # I'm really not sure if r+ is ideal here.
yaml = YAML()
code = yaml.load(f)
code['car']['model'] = 'Sequoia'
but get the same error error:
ruamel.yaml.composer.ComposerError: expected a single document in the stream
in "cars.txt", line 2, column 1
but found another document
in "cars.txt", line 5, column 1
When you have multiple YAML documents in one file these are separated with a line consisting of three dashes, or starting with three dashes followed by a space. Most YAML parsers, including
ruamel.yaml
either expect a single document file (when usingYAML().load()
) or a multi-document file (when usingYAML().load_all()
).The method
.load()
returns the single data structure, and complains if there seems to be more than one document (i.e. when it encounters the second---
in your file). The.load_all()
method can handle one or more YAML documents, but always returns an iterator.Your input happens to be a valid multi-document YAML file but the markdown part often makes this not be the case. It easily could always have been valid YAML by just changing the second
---
into--- |
thereby making the markdown part a (multi-line) literal scalar string. I have no idea why the designers of such YAML frontmatter formats didn't specify that, it might have to do that some parsers (like PyYAML) fail to parse such non-indented literal scalar strings at the root level correctly, although examples of those are in the YAML specification.In your example the markdown part is so simple that it is valid YAML without having to specify the
|
for literal scalar string. So you could use.load_all()
on this input. But just adding e.g. a line starting with a dash to the markdown section, will result in an invalid YAML document, so you if you use.load_all()
, you have to make sure you do not iterate so far as to parse the second document:which gives:
You shouldn't try to update the file however (so don't use
r+
), as your YAML frontmatter might be longer than the original and and updating would overwrite your markdown. For updating, read file into memory, split into two parts based on the second line of dashes, update the data, dump it and append the dashes and markdown:which gives: