Say I have a grammar like this:
match start
match first
match rule a
match rule b
match a
match string, "a"
match b
match string, "b("
match optional
match rule start
match optional
match many
match string, ","
match rule start
match string, ")"
It compiles into JSON like this:
{
"start": {
"type": "rule",
"name": "start",
"children": [
{
"type": "match-first",
"children": [
{
"type": "match-rule",
"rule": "a"
},
{
"type": "match-rule",
"rule": "b"
}
]
}
]
},
"a": {
"type": "rule",
"name": "a",
"children": [
{
"type": "match-string",
"value": {
"type": "string",
"value": "a"
}
}
]
},
"b": {
"type": "rule",
"name": "b",
"children": [
{
"type": "match-string",
"value": {
"type": "string",
"value": "b("
}
},
{
"type": "match-optional",
"children": [
{
"type": "match-rule",
"rule": "start"
}
]
},
{
"type": "match-optional",
"children": [
{
"type": "match-many",
"children": [
{
"type": "match-string",
"value": {
"type": "string",
"value": ","
}
},
{
"type": "match-rule",
"rule": "start"
}
]
}
]
},
{
"type": "match-string",
"value": {
"type": "string",
"value": ")"
}
}
]
}
}
It can match any of these strings theoretically:
a
b()
b(a)
b(a,a)
b(b(b()))
b(a,b(a,a,b(),a,a))
How do I generate an optimized parser (that simply returns true/false on whether or not the input string matches) out of this grammar JSON? My initial inkling is to try and do this like my recursive descent parser that just takes a grammar and a string and parses using the "uncompiled" grammar. But it was pointed out that this would be inefficient and is why "parser-generators" generate parsers specific to each grammar. How can that be done using this grammar provided in JSON form? Could it involve "parse tables" somehow? I am not really sure how this could be done. Ideally it would be done without recursion, somehow just looping and shifting the parser state appropriately.
I am not really sure how to begin because I don't see what the desired structure is supposed to be from the output of a generateParser(grammar) function.
function generateParser(grammar) {
const table = []
let x = 0
for (let key in grammar) {
let rule = grammar[key]
for (let i = 0, n = rule.children.length; i < n; i++) {
let pattern = rule.children[i]
switch (pattern.type) {
case 'match-first':
addMatchFirst(pattern)
break
}
}
}
return function parse(string) {
// iterate through table and transition somehow?
}
function addMatchFirst(p) {
table.push((state) => {
state.stack.push(/* new state? */)
})
for (let i = 0, n = p.children.length; i < n; i++) {
let pattern = p.children[i]
switch (pattern.type) {
case 'match-rule':
addMatchRule(pattern)
break
}
}
}
function addMatchRule(p) {
table.push((state) => {
state.stack.push(/* new state? */)
})
}
}
Looking at a parse table implementation in JavaScript seems to primitive and I don't see how to apply it.
If it's not possible to make this using a table, how else would you generate the parsing algorithm? All I am looking for is a true/false answer from the generated parser on whether or not the input string matches the grammar.
How you can parse depends on the semantics of your grammar notation. It has various constructs where there are alternatives:
If, at every point where there are alternatives, the semantics are that each alternative is "valid", then your notation basically defines a Context Free Grammar, so a large set of parsing options are available to you.
However, the wording of the "match first" line suggests that it means "try the 'children' in the order given and use the first that succeeds". If that's the case, then it looks like your notation is defining a Parsing Expression Grammar, so the parsing options are more limited. Packrat parsers seem popular.
It's unclear if you could use a table-based parsing algorithm such as you link to -- the Wikipedia article says "It is also possible to build LL parsers and LR parsers from parsing expression grammars", which implies that you could use an LL or LR parsing table. However, there's no citation for that statement, and I'm doubtful that it's true.
Did someone give you the
generateParsercode as a start at a parser implementation? It has a variable namedtable, but note that it isn't a table in the LL or LR sense, though there might be some similarities. At any rate, it appears to have some useful chunks of code.