Using the parser combinator crate "nom" to partially read and parse a file

122 views Asked by At

I have a usecase where I only want to parse the beginning of a file to to the point where I decide to stop parsing. This can be 4kB or more. The exact amount is not known. So there is no need to read the whole file (which can be 100kB for instance).

All what I have seen in the most used parser combinator crate named "nom" is that there always whole chunks of bytes or characters are parsed which are complete in the memory.

The idea to build an own datastructure which dynamically reads from a file didn't work out well because I cannot override immutable self references as seen here:


  struct MyInput {
   pb : PathBuf,
   read : Box::<dyn Read>,
   filelength : u64,
   current_content : String,
  }


  impl MyInput {
   fn new( pb : PathBuf) -> Self{
    let file = OpenOptions::new().read(true).open(pb.clone()).unwrap();
    let filelength = std::fs::metadata(pb.clone()).unwrap().len();
    let current_content = "".to_string();
    Self { pb, read : Box::new( file), filelength, current_content}
   }
  }

  impl InputLength for MyInput {
   fn input_len(&self) -> usize {
    self.filelength as usize // here I have to tweake something later because char vs. u8
   }
  }

  // here lies the problem

  impl Compare<&str> for MyInput {
    fn compare(&self, t: &str) -> nom::CompareResult { 

      // I cannot fill a cache as I indended to do in self.current_content 
      // because &self is immutable
        todo!()
    }

    fn compare_no_case(&self, t: &str) -> nom::CompareResult {
        todo!()
    }
  }


Is there another solution for my problem?

1

There are 1 answers

2
Aleksander Krauze On

I don't think that this should be your preferred solution, but if you want to mutate self, and the interface of the trait gives you only a &self, you can use interior mutability pattern. For example RefCell. Read this detailed explanation of it in The Book to learn more about it. But for quick snippet you could use this. Note that this will add a small runtime overhead of tracking the number of RefCell references.

use std::cell::RefCell;
use std::ops::DerefMut;

struct Inner {
    pb : PathBuf,
    read : Box::<dyn Read>,
    filelength : u64,
    current_content : String,
}

struct MyInput {
    inner: RefCell<Inner>,
}

impl Compare<&str> for MyInput {
    fn compare(&self, t: &str) -> nom::CompareResult {
        // The call to deref_mut is only to show that you can obtain
        // a mutable reference to inner. 
        let inner: &mut Inner = self.inner.borrow_mut().deref_mut();

        todo!();
    }
}