How can I express this format in EBNF?

727 views Asked by At

I have the following data:

dbCon= {
    main = {
        database = "db1",
        hostname = "db1.serv.com",
        maxConnCount = "5",
        port = "3306",
        slaves = [
            {
                charset = "utf8",
                client = "MYSQL",
                compression = "true",
                database = "db1_a",
                hostname = "db1-a.serv.com",
                maxConnCount = "5",
                port = "3306",
            }
            {
                charset = "utf8",
                client = "MYSQL",
                compression = "true",
                database = "db1_b",
                hostname = "db1-b.serv.com",
                maxConnCount = "5",
                port = "3306",
            }
        ]
        username = "user-1"
    }
}

I'm trying to use Grako to convert this into JSON, but I can't get the EBNF format correct. Here's what I have:

import grako
import json

grammar_ebnf = """
    final = @:({ any } | { bracketed } | { braced });
    braced = '{' @:( { bracketed } | { braced } | { any } ) '}' ;
    bracketed = '[' @:( { braced } | { bracketed } | { any } ) ']' ;
    any = /^[^\[\{\]\}\n]+/ ;
"""

model = grako.genmodel("final", grammar_ebnf)
with open('out.txt') as f:
    ast = model.parse(f.read())
    print (json.dumps(ast, indent = 4))

However, this just prints out:

[
    "dbCon = "
]

Where am I going wrong? I've never used Grako. I just want to be able to parse this into something usable/accessible, without designing a static parser in case the format changes. If the format changes later, it seems easier to update the EBNF rather than reworking a whole parser.

1

There are 1 answers

7
Jay Kominek On BEST ANSWER

It's hard to be sure what the real grammar is with just one example, but hopefully this is enough that you'll be able to finish tweaking it to deal with any weirdness.

We need the Semantics class to deal with converting the key/value pairs and lists of them into dictionaries. Careful use of @: otherwise does the job.

As a suggestion, when naming rules in a grammar, name them after what they are (list, dict, etc) not what they look like (braced, bracketed). Also, split things up into lots of rules to start with. You can always coalesce them later.

#!/usr/bin/python

import grako
import json

grammar = """
final = kvpair;
kvpair = key '=' value;
key = /[^\s=]+/;
value = @:(dict | list | string) [','];
list = '[' @:{ value } ']';
string = '"' @:/[^"]*/ '"';
dict = '{' @:{ kvpair } '}';
"""

class Semantics(object):
    def kvpair(self, arg):
        key, ignore, value = arg
        return { key: value }
    def dict(self, arg):
        d = { }
        for v in arg:
            d.update(v)
        return d

model = grako.genmodel("final", grammar)

with open('out.txt') as f:
    ast = model.parse(f.read(), semantics=Semantics())
    print json.dumps(ast, indent=4)

This produces output of:

{
    "dbCon": {
        "main": {
            "username": "user-1",
            "maxConnCount": "5",
            "slaves": [
                {
                    "maxConnCount": "5",
                    "hostname": "db1-a.serv.com",
                    "compression": "true",
                    "database": "db1_a",
                    "charset": "utf8",
                    "port": "3306",
                    "client": "MYSQL"
                },
                {
                    "maxConnCount": "5",
                    "hostname": "db1-b.serv.com",
                    "compression": "true",
                    "database": "db1_b",
                    "charset": "utf8",
                    "port": "3306",
                    "client": "MYSQL"
                }
            ],
            "database": "db1",
            "hostname": "db1.serv.com",
            "port": "3306"
        }
    }
}