How can I consume a list of tokens that may or may not be separated by a space?
I'm trying to parse Chinese romanization (pinyin) in the cedict format with nom
(6.1.2). For example "ni3 hao3 ma5"
which is, due to human error in transcription, sometimes written as "ni3hao3ma5"
or "ni3hao3 ma5"
(note the variable spacing).
I have written a parser that will handle individual syllables e.g. ["ni3", "hao3", "ma5"]
, and I'm trying to use a nom::multi::separated_list0
to parse it like so:
nom::multi::separated_list0(
nom::character::complete::space0,
syllable,
)(i)?;
However, I get a Err(Error(Error { input: "", code: SeparatedList }))
after all the tokens have been consumed.
The problem with using
Is that the
space0
delimiter matches empty string, so it will reach the end of the input string and theseparated_list0
will continue to try to consume the empty string, hence theErr(Error(Error { input: "", code: SeparatedList }))
.The solution in my case was to use
nom::multi::many1
and handling the optional spaces in the inner parser instead ofnom::multi::separated_list0
like so: