I want to select the word "String" from the line "String helloString String Stringhello helloStringhello".
Here should selected the 2 words "
String"(first and the middle)"
String" in "helloString" or "Stringhello" or "helloStringhello" shouldn't be selected.
This is my RE:
<YYINITIAL> (String) {return new Token(TokenType.String,yytext());}
But it select any word "String".
My Jlex code:
import java.io.*;
enum TokenType {Type_String,Identifier}
class Token{
String text;
TokenType type;
Token(TokenType type,String text)
{
this.text=text;
this.type=type;
}
public String toString()
{
return String.format("[%s,%s]",type,text);
}
}
%%
%class Lexer
%public
%function getNextToken
%type Token
%{
public static void main(String[] args) throws IOException {
FileReader r = new FileReader("in.txt");
Lexer l = new Lexer(r);
Token tok;
while((tok=l.getNextToken())!=null){
System.out.println(tok);
}
r.close();
}
%}
%line
%char
SPACE=[\r\t\n\f\ ]
ALPHA=[a-zA-Z]
DIGIT=[0-9]
ID=({ALPHA}|_)({ALPHA}|{DIGIT}|_)*
%%
<YYINITIAL> {ID} {return new Token(TokenType.Identifier,yytext());}
<YYINITIAL> (String) {return new Token(TokenType.Type_String,yytext());}
<YYINITIAL> {SPACE}* {}
<YYINITIAL> . {System.out.println("error - "+yytext());}
If I run your code on your example input, I don't see the behaviour you describe. The words
helloStringetc. aren't recognized as tokens of typeType_String, but as tokens of typeIdentifier, which I assume is the intended behaviour. So that part is actually working fine.What isn't working fine is that
Stringby itself is also recognized as an identifier. The reason for that is that if two rules can produce a match of the same length, the rule that comes first is chosen. You've defined the rule for identifiers before the rule for the string keyword, so that's why it's always chosen. If you switch the two rules around,Stringby itself will be recognized asType_Stringand everything else will be recognized as an identifier.