Is there good solution to parse Scheme's multiline string literal syntax with nom?

191 views Asked by At

I am trying to write Scheme programming language syntax parser to learn nom. Now I am facing a problem with parsing string literals, is there a good solution?

The problem is following:

  1. Scheme can include code point literal in string. (e.g. \x03bb; -> λ) It is natural to parse this as char type.
fn parse_hex_literal(str: &str) -> IResult<&str, char> { ... } 
  1. Scheme can write multiline literal with escaped line ending, and this must be interpreted as nothing. There is no empty char in rust, so I need to use empty &str instead.
   value("", tuple((space0, line_ending, space0)))
  1. Because 1 and 2 has different type signature(char and &str), I cannot be used simultaneously in the alt combinator. Also, there should be no way to convert a char to &str across a lifetime boundary.

When parsing escapable strings in nom, I think it is natural to accompany the escaped_transform with an alt, but is there a way to handle cases like this where type matching is difficult?

// This code cannot be compiled


fn parse_string(input: &str) -> IResult<&str, String>
{
    delimited(char('"'),
        escaped_transform(
            none_of(r#"\""#),
            '\\',
            alt((
                value("\n", char('n')), // Literals can be either char or &str.
                // ... other escape character literals.

                // intraline -> nothing: &str
                value("", tuple((space0,line_ending,space0))),

                // hex scalar value: char
                delimited(char('x'), parse_hex_literal, char(';')),
            ))
        ),
    char('"'))(input)
}
0

There are 0 answers