I am attempting to write a basic scanner using Flex. I want the identifiers to be starting with alphabets and contain alphabets and digits. The code I write for it is:
[A-Za-z][A-Za-z0-9]* {// identifier}
0|{non_zero_digits}{digits}* {// integers}
. {// invalid}
My expectation is to classify 123abc as an invalid sequence. However, at this point it detects 123 as an integer and abc as an identifier.
Is there a way to have certain delimiters like space, tab, ; , " \n etc. between the detected sequences such that this doesn't happen?
I tried defining delimiters and changing the regular expressions to something like:
[A-Za-z][A-Za-z0-9]*{delimiters} {// identifier}
0|{non_zero_digits}{digits}*{delimiters} {// integers}
. {// invalid}
This works fine with identifiers but causes a problem with strings as the " gets used up in identifying the pervious token causing the actual string to be not processed correctly.