Extract Token List from OCamllex lexbuf

613 views Asked by At

I am writing a Python interpreter in OCaml using ocamllex, and in order to handle the indentation-based syntax, I want to

  1. tokenize the input using ocamllex
  2. iterate through the list of lexed tokens and insert INDENT and DEDENT tokens as needed for the parser
  3. parse this list into an AST

However, in ocamllex, the lexing step produces a lexbuf stream which can't be easily iterated through to do the indentation checking. Is there a good way to extract a list of tokens from lexbuf, i.e.

let lexbuf = (Lexing.from_channel stdin) in
let token_list = tokenize lexbuf

where token_list has type Parser.token list? My hack was to define a trivial parser like

tokenize: /* used by the parser to read the input into the indentation function */
  | token EOL { $1 @ [EOL] }
  | EOL { SEP :: [EOL] }

token:
  | COLON { [COLON] }
  | TAB { [TAB] }
  | RETURN { [RETURN] }
   ...
  | token token %prec RECURSE { $1 @ $2 }

and to call this like

    let lexbuf = (Lexing.from_channel stdin) in
    let temp = (Parser.tokenize Scanner.token) lexbuf in (* char buffer to token list *)

but this has all sorts of issues with shift-reduce errors and unnecessary complexity. Is there a better way to write a lexbuf -> Parser.token list function in OCaml?

0

There are 0 answers