How can i select specific word in Regular Expression Jlex?

185 views Asked by At

I want to select the word "String" from the line "String helloString String Stringhello helloStringhello".

Here should selected the 2 words "String"(first and the middle)

"String" in "helloString" or "Stringhello" or "helloStringhello" shouldn't be selected.

This is my RE:

<YYINITIAL> (String) {return new Token(TokenType.String,yytext());}

But it select any word "String".

My Jlex code:

import java.io.*;
enum TokenType {Type_String,Identifier}
class Token{
    String text;
  TokenType type;
  Token(TokenType type,String text)
  {
    this.text=text;
    this.type=type;
  }

  public String toString()
  {
    return String.format("[%s,%s]",type,text);
  }
}
%%
%class Lexer
%public
%function getNextToken
%type Token
%{
     public static void main(String[] args) throws IOException {
        FileReader r = new FileReader("in.txt");
        Lexer l = new Lexer(r);
        Token tok;
        while((tok=l.getNextToken())!=null){
            System.out.println(tok);
        } 
        r.close();
    }
%}
%line
%char
SPACE=[\r\t\n\f\ ]
ALPHA=[a-zA-Z]
DIGIT=[0-9]
ID=({ALPHA}|_)({ALPHA}|{DIGIT}|_)*



%%
<YYINITIAL> {ID} {return new Token(TokenType.Identifier,yytext());}
<YYINITIAL> (String) {return new Token(TokenType.Type_String,yytext());}
<YYINITIAL> {SPACE}* {}
<YYINITIAL> . {System.out.println("error - "+yytext());}
2

There are 2 answers

5
sepp2k On

If I run your code on your example input, I don't see the behaviour you describe. The words helloString etc. aren't recognized as tokens of type Type_String, but as tokens of type Identifier, which I assume is the intended behaviour. So that part is actually working fine.

What isn't working fine is that String by itself is also recognized as an identifier. The reason for that is that if two rules can produce a match of the same length, the rule that comes first is chosen. You've defined the rule for identifiers before the rule for the string keyword, so that's why it's always chosen. If you switch the two rules around, String by itself will be recognized as Type_String and everything else will be recognized as an identifier.

1
Adham Mostafa On

This is my second Jlex code:

import java.io.*;
enum TokenType {OutPut_Instruction,Quoted_Stentence,Semi,L_Pracet,R_Pracet,Type_int,Type_double,Type_String,Identifier}
class Token{
    String text;
  TokenType type;
  Token(TokenType type,String text)
  {
    this.text=text;
    this.type=type;
  }

  public String toString()
  {
    return String.format("[%s,%s]",type,text);
  }
}
%%
%class Lexer
%public
%function getNextToken
%type Token
%{
     public static void main(String[] args) throws IOException {
        FileReader r = new FileReader("in.txt");
        Lexer l = new Lexer(r);
        Token tok;
        while((tok=l.getNextToken())!=null){
            System.out.println(tok);
        } 
        r.close();
    }
%}
%line
%char
SPACE=[\r\t\n\f\ ]
SEMI_COLO=[;]
L_P=[(]
R_P=[)]
DOUBLE_COT="\""([^\n\"]*(\\[.])*)*"\""
PRINT=(Print)
ALPHA=[a-zA-Z]
DIGIT=[0-9]
INT=(int)
DOUBLE=(double)
STRING=(String)
TYPE=(int)|(double)|(String)
ID=({ALPHA}|_)({ALPHA}|{DIGIT}|_)*



%%
<YYINITIAL> {L_P} {return new Token(TokenType.L_Pracet,yytext());}
<YYINITIAL> {R_P} {return new Token(TokenType.R_Pracet,yytext());}
<YYINITIAL> {SEMI_COLO} {return new Token(TokenType.Semi,yytext());}
<YYINITIAL> {PRINT} {return new Token(TokenType.OutPut_Instruction,yytext());}
<YYINITIAL> [^{TYPE}\ ]{ID} {return new Token(TokenType.Identifier,yytext());}
<YYINITIAL> {INT} {return new Token(TokenType.Type_int,yytext());}
<YYINITIAL> {DOUBLE} {return new Token(TokenType.Type_double,yytext());}
<YYINITIAL> {STRING} {return new Token(TokenType.Type_String,yytext());}
<YYINITIAL> {DOUBLE_COT} {return new Token(TokenType.Quoted_Stentence,yytext());}
<YYINITIAL> {SPACE}* {}
<YYINITIAL> . {System.out.println("error - "+yytext());}

this is the input

> ah String ah Stringahmredgah Sahmed String int

this is the output

[Identifier,ah]
[Type_String,String]
[Identifier,ah]
[Type_String,String]
[Identifier,ahmredgah]
error - S
[Identifier,ahmed]
[Type_String,String]
[Type_int,int]