How to capture a string without quote characters

1.8k views Asked by At

I'm trying to capture quoted strings without the quotes. I have this terminal

%token <string> STRING

and this production

constant:
    | QUOTE STRING QUOTE { String($2) }

along with these lexer rules

| '\''       { QUOTE }
| [^ '\'']*  { STRING (lexeme lexbuf) } //final regex before eof

It seems to be interpreting everything leading up to a QUOTE as a single lexeme, which doesn't parse. So maybe my problem is elsewhere in the grammar--not sure. Am I going about this the right way? It was parsing fine before I tried to exclude quotes from strings.

Update

I think there may be some ambiguity with the following lexer rules

let name = alpha (alpha | digit | '_')*
let identifier = name ('.' name)*

The following rule is prior to STRING

| identifier    { ID (lexeme lexbuf) }

Is there any way to disambiguate these without including quotes in the STRING regex?

3

There are 3 answers

3
Stephen Swensen On BEST ANSWER

It's pretty normal to do semantic analysis in the lexer for constants like strings and numeric literals, so you might consider a lex rule for your string constants like

| '\'' [^ '\'']* '\'' 
    { STRING (let s = lexeme lexbuf in s.Substring(1, s.Length - 2)) }
1
jbondia On

I had a similar problem. I capture them in the "lexic.l" file using states. Here my autoanswer

0
Vitaliy On

You can use lexeme with quotes, but trim quotes in parser

Lexer:

let constant       = ("'" ([^ '\''])* "'")
...
| constant         { STRING(lexeme lexbuf) } 

Parser:

%token <string> STRING

...
constant:
    | STRING { ($1).Trim([|'''|]) }

Or if you want to extract quotes from string:

Lexer:

let name = alpha (alpha | digit | '_')*
let identifier = name ('.' name)*
...

| '\''       { QUOTE }
| identifier { ID (lexeme lexbuf) }
| _          { STRING (lexeme lexbuf) } 

identifier will take away symbols from STRING, so your lexeme stream can be like: QUOTE ID STRING ID .. QUOTE, and you have to handle this in parser:

Parser:

constant:
     | QUOTE content QUOTE     { String($2) }

content:
     | ID content      { $1+$2 }
     | STRING content  { $1+$2 }
     | ID              { $1 }
     | STRING          { $1 }