How to use moo lexer (and nearley) with large files

444 views Asked by At

I am trying to find something that will parse very large files (PGN files, basically.) I started using antlr4, but even though they claim that their classes are "streams", they aren't. antlr4 takes my 5,457,518 game test file and tried to load the entire 1.7G file into a gigantic string, and then parse it, causing an out of memory crash. Thus, I threw it out and am now trying moo/nearley.

Well, I have a similar problem it seems. Even though both moo and nearley provide methods that have a so-called "chunk" as a parameter, moo in particular fails to realize that it's at the end of it's string and could get more on the next moo.feed.

My test program, for example, tries to send this to moo, two bytes at a time: [Abcde "bc def"]. It spits out LBRACKET correctly But then it spits out A as a symbol. If I do a moo.reset(next_two), it then spits out bc as a second symbol.

So my question is, how exactly do you, master lexer/parser, do this? Should I go back to antlr4? Should I use moo/nearley in a different way? Is there a better lexer/parser out there? I really don't want to have to write my own from scratch, but I'm really starting to wonder if there is any other way.

0

There are 0 answers