Writing parser in Rust - peeking two chars ahead

2.1k views Asked by At

I'm working on a parser in Rust. The goal is to parse into an AST and then use serde to serialize the AST into JSON.

The DSL that I'm going to parse is semi-similar to JavaScript, but much simpler.

pub struct Parser<'a> {
    source: Peekable<str::Chars<'a>>,
}

impl<'a> Parser<'a> {
    pub fn new(source: &str) -> Parser {
        Parser {
            source: source.chars().peekable(),
        }
    }

    pub fn parse(&mut self) -> Resource {
        let mut entities = Map::new();

        self.skip_ws();

        loop {
            let entity = self.get_entity();
            entities.insert(entity.id, entity);
            self.skip_ws();
        }
        Resource(entities)
    }

    fn get_entity(&mut self) {
        let id = self.get_identifier();
        self.skip_line_ws();

        if !self.next_char('=') {
            panic!();
        }

        self.bump();

        self.skip_line_ws();

        let value = self.get_pattern();

        if self.next_char('[') && self.next_char('[', 1) {
           // get attributes
           // return entity with attributes
        } else {
           // return entity without attributes
        }
    }
}

In two cases, peeking only one character is not sufficient to identify which token I'm collecting. For example, if the peeked character is '[', and the next after it is '[', then it's not part of the entity, but if it's a '[' and then not '[', it's an attribute.

I know that in theory I can use next() to collect a character and then use peek() to look into the next one, but that poses a problem when you identify that the result is not part of the Entity, because in that case, I'd like to move the pointer back one character, and return.

That also doesn't solve the problem in the scenario where I need to peek 3 characters ahead.

It seems to me that I either need ability to peek two chars ahead, or I need an ability to advance the iterator and then move it back. I found multipeek in Itertools that claim to allow for peeking multiple characters ahead, but I don't know how to fit it into my parser. Can someone guide me or point out at a different approach?

0

There are 0 answers