How to implement "take while, but at most N characters" in nom?

787 views Asked by At

How can I give nom's take_while an upper limit on the number of characters it should match? I want to take characters according to a certain condition, but at most N.

Since this will be a very performance critical part of the parser, I'd like to avoid using a bunch of individual take(1usize). Partly, because it feels awkward having to deal with single element slices one-by-one, but also because the compiler probably cannot know at compile time that they must have a size of 1, i.e., it will likely have to generate bound checks or size assertions, right?

Conceptually, a take_while with an upper limit would feel more appropriate. I was hoping I can do the counting with a mutable loop variable myself like so:

let mut i = 0;

let (input, _) = take_while(|c| {
    i += 1;
    c >= 0x80 && i < 9
})(input)?;

In fact, I could even extract the necessary information in the closure, which most like should result in a very efficient code generation. (The goal is to implement a certain kind of VarInt LSB encoding, i.e., I could update a local mutable variable x = x + (if last_byte { c } else { c & 0x7f }) << 7.)

Unfortunately take_while seems to allow only Fn and not FnMut which probably means this is impossible (why the limitation?). What else could I do to implement that nicely?

2

There are 2 answers

2
cdhowie On BEST ANSWER

Use a Cell to make your closure have interior mutability. Then it can have mutable state, but still implement Fn:

let i = Cell::new(0);

let (input, _) = take_while(|c| {
    i.set(i.get() + 1);
    c > 0x80 && i.get() < 9
})(input)?;
0
Mika Vatanen On

There is a nom-function take_while_m_n:

const N: usize = ...

fn take_native(input: &[u8]) -> IResult<&[u8], &[u8]> {
    take_while_m_n(0, N, |c| c > 0x80)(input)
}

However, with quick benchmark it seems to be remarkably slower than the Cell -answer (or benchmark optimizes wrongly, since the Cell -version takes only 1ns/iter compared to 13ns/iter for the take_while_m_n).

#![feature(test)]
extern crate test;

use std::cell::Cell;

use nom::{
    bytes::complete::{take_while, take_while_m_n},
    IResult,
};

fn take_cell(input: &[u8]) -> IResult<&[u8], &[u8]> {
    let i = Cell::new(0);

    let (input, output) = take_while(|c| {
        i.set(i.get() + 1);
        c > 0x80 && i.get() < 5
    })(input)?;

    Ok((input, output))
}

fn take_native(input: &[u8]) -> IResult<&[u8], &[u8]> {
    take_while_m_n(0, 4, |c| c > 0x80)(input)
}

#[cfg(test)]
mod tests {
    use super::*;

    const INPUT: &[u8] = &[
        0x81, 0x82, 0x83, 0x84, 0x81, 0x82, 0x83, 0x84, 0x81, 0x82, 0x83, 0x84, 0x81, 0x82, 0x83,
        0x84, 0x81, 0x82, 0x83, 0x84, 0x81, 0x82, 0x83, 0x84, 0x81, 0x82, 0x83, 0x84,
    ];

    #[bench]
    fn bench_cell(b: &mut test::Bencher) {
        assert_eq!(take_cell(INPUT).unwrap().1, &[0x81, 0x82, 0x83, 0x84]);

        b.iter(|| take_cell(INPUT).unwrap());
    }

    #[bench]
    fn bench_native(b: &mut test::Bencher) {
        assert_eq!(take_native(INPUT).unwrap().1, &[0x81, 0x82, 0x83, 0x84]);

        b.iter(|| take_native(INPUT).unwrap());
    }
}