I am using Ruby::Parslet.
I am parsing a document similar to an SV interface, eg:
interface my_intf;
protocol validonly;
transmit [Bool] valid;
transmit [Bool] pipeid;
transmit [5:0] incr;
transmit [Bool] sample;
endinterface
Here is my parser:
class myParse < Parslet::Parser
rule(:lparen) { space? >> str('(') >> space? }
rule(:rparen) { space? >> str(')') >> space? }
rule(:lbox) { space? >> str('[') >> space? }
rule(:rbox) { space? >> str(']') >> space? }
rule(:lcurly) { space? >> str('{') >> space? }
rule(:rcurly) { space? >> str('}') >> space? }
rule(:comma) { space? >> str(',') >> space? }
rule(:semicolon) { space? >> str(';') >> space? }
rule(:eof) { any.absent? }
rule(:space) { match["\t\s"] }
rule(:whitespace) { space.repeat }
rule(:space?) { whitespace.maybe }
rule(:blank_line) { space? >> newline.repeat(1) }
rule(:newline) { str("\n") }
# Things
rule(:integer) { space? >> match('[0-9]').repeat(1).as(:int) >> space? }
rule(:identifier) { match['a-z'].repeat(1) }
rule(:intf_start) { space? >> str('interface') >> space? >> (match['a-zA-Z_'].repeat(1,1) >> match['[:alnum:]_'].repeat(0)).as(:intf_name) >> space? >> str(';') >> space? >> str("\n") }
rule(:protocol) { space? >> str('protocol') >> whitespace >> (str('validonly').maybe).as(:protocol) >> space? >> str(';') >> space? >> str("\n") }
rule(:bool) { lbox >> space? >> str('Bool').as(:bool) >> space? >> rbox }
rule(:transmit_width) { lbox >> space? >> match('[0-9]').repeat.as(:msb) >> space? >> str(':') >> space? >> match('[0-9]').repeat.as(:lsb) >> space? >> rbox }
rule(:transmit) { space? >> str('transmit') >> whitespace >> (bool | transmit_width) >> whitespace >> (match['a-zA-Z_'].repeat(1,1) >> match['[:alnum:]_'].repeat(0)).as(:transmit_name) >> space? >> str(';') >> space? >> str("\n") }
rule(:interface_body) { (protocol | blank_line.maybe) }
rule(:interface) { intf_start >> interface_body }
rule(:expression) { ( interface ).repeat }
root :expression
end
I am having an issue making the rule for interface_body.
It can have 0 or more transmit lines and 0 or 1 protocol line and multiple blanks, comments etc.
Can someone help me out please? The rules I have written in the code snippet works with single transmit and single protocol, i.e. they properly match, but when I parse a whole interface it does not work.
Thanks in advance.
Ok... this parses the file you mentioned. I don't understand the desired format so I can't say it will work for all your files, but hopefully this will get you started.
The main changes...
Don't consume whitespace both side of your tokens. You had expressions that parsed "[Bool] valid" as LBOX BOOL RBOX SPACE? then expected another WHITESPACE but couldn't find one (as the previous rule had consumed it).
When an expression can validly parse as a zero length (e.g. something with repeat(0)) and there is a problem with who it's written, then you get an odd error. The rule pass and match nothing, then the next rule will typically fail. I explicitly matched 'body lines' as 'not the end line' so it would fail with an error.
'repeat' defaults to (0) which I would love to change. I see mistakes around this all the time.
x.repeat(1,1) means make one match. That's the same as having x. :)
there were more whitespace problems
so....
Write your parser from the top down. Write tests from the bottom up. When your tests get to the top you are done! :)
Good luck.