I'm new in compiler construction world , I want to know what are the differences between direct-coded vs table-driven lexer analyzer ?
Please use simple source code example if it's possible.
Thanks.
Edit :
in Engineering a Compiler book, the author divided lexers into three(3) types: table-driven,direct coded, and hand coded.
I shall assume that you by "direct-coded" mean a hand-written lexer rather than one produced as the output of a lexer generator. In that case...(See below.)... a table-driven lexer is a (usually automatically generated) program that uses some kind of lookup table to determine which action to take. Consider the finite automaton that corresponds to the regular expression
ab*a
(intentionally not minimized):If we limited ourselves to only considering characters 'a' and 'b', we could encode it in a lookup table like so:
Now we just need a driver that actually uses the table:
Of course, in a real lexer every token would have a corresponding accepting state, and you would have to somehow modify the accept table to contain the token identifier. I want to stress two things, though:
A hand-written (for lack of a better name) lexer doesn't usually use a lookup table. Suppose we want a lexer for a simple Lisp-like language that has parentheses, identifiers and decimal integers:
Like the table-driven lexer example, this one is not complete. For one thing, it needs some kind of buffering to store the characters that are part of a token like
IDENT
andNUMBER
. For another, it doesn't handle EOF particularly gracefully. But hopefully you get the gist of it.Edit: Based on the definition in Engineering a Compiler, a direct-coded lexer is basically a hybrid of the two. Rather than using a table, we use code labels to represent states. Let's see how that would look using the same DFA as before.
In my personal experience the "best" approach to writing lexers is the hand-written approach I outlined above. It works on every platform, in every language, you need not learn ancient tools like lex, and, perhaps most importantly, you have the flexilbility to extend the lexer with features that are difficult or impossible to implement with a tool. For example, perhaps you want your lexer to understand Python-esque block indentaiton, or maybe you need to dynamically extend certain token classes.