I am trying to write Scheme programming language syntax parser to learn nom. Now I am facing a problem with parsing string literals, is there a good solution?
The problem is following:
- Scheme can include code point literal in string. (e.g. \x03bb; -> λ) It is natural to parse this as char type.
fn parse_hex_literal(str: &str) -> IResult<&str, char> { ... }
- Scheme can write multiline literal with escaped line ending, and this must be interpreted as nothing. There is no empty char in rust, so I need to use empty &str instead.
value("", tuple((space0, line_ending, space0)))
- Because 1 and 2 has different type signature(char and &str), I cannot be used simultaneously in the alt combinator. Also, there should be no way to convert a char to &str across a lifetime boundary.
When parsing escapable strings in nom, I think it is natural to accompany the escaped_transform with an alt, but is there a way to handle cases like this where type matching is difficult?
// This code cannot be compiled
fn parse_string(input: &str) -> IResult<&str, String>
{
delimited(char('"'),
escaped_transform(
none_of(r#"\""#),
'\\',
alt((
value("\n", char('n')), // Literals can be either char or &str.
// ... other escape character literals.
// intraline -> nothing: &str
value("", tuple((space0,line_ending,space0))),
// hex scalar value: char
delimited(char('x'), parse_hex_literal, char(';')),
))
),
char('"'))(input)
}