Generating a JavaScript SQL parser for SQLite3 (with Lemon? ANTLR3?)

1.6k views Asked by At

For the last couple of weeks i've been diving into the pretty world of parsing SQL statements into something managable, only to find out that i'll probably need a full lexer/parser to properly handle all the allowed tokens/formats to do the same thing.

I'm mostly interested in the create table statements, but a full generic parser would be even nicer, since nobody on the web seems to have this yet.

I'm no computer graduate, but a self-taught man, so this is quite the learning curve for me. The steps I took were:

  1. Parse sql with regexes
  2. That fails, fix regex
  3. That fails worse, dig through the SQLite source to find out that it uses the Lemon parser, an SQLite specific project
  4. Try to get Lemon + PHP parser working thinking i can convert that manually to js. (failed)
  5. Try to get Emscripten working on the Lemon generated parser in C (dependency hell, failed.)
  6. Search for other parser/lexer generators, note ANTLR3
  7. Try all day and night to get the Sqljet language file converted to javascript by changing output format and backtracing errors.

I have been using the excellent AntlrWorks GUI to try and figure out what is going wrong, but i'm unsure if it's the Javascript stack that's breaking, the Java stack, or that the .g format is in an old format for v2.

Is there anyone with parser / lexer generator experience that can point me to the correct direction for generating a proper reusable Sqlite parser? I seem to be able to generate the parsers in javascript for both mysql and pl/sql. Does that mean that the sqlite .g format is in need of updating?

1

There are 1 answers

0
Bart Kiers On

To be able to use a grammar with the JavaScript target, you must:

  • change the target language in the options block: options { language=JavaScript; }
  • change all embedded code in the .g grammar file (the stuff between { and }) with JavaScript code. Note that ANTLR does not convert this code depending on what you define in the options' language value, you'll have to do that yourself!
  • when generating a parser, don't use ANTLRWorks but do so on the command line and use a large amount of heap with java's -Xmx argument: SQL grammar are large beasts that need a large amount of memory

Here's a previous Q&A that shows how to use (and run) an ANTLR generated parser in combination with the JavaScript target: antlr3 - Generating a Parse Tree

HTH