Extract Token List from OCamllex lexbuf

599 views Asked by JAustin At 04 November 2018 at 18:19

I am writing a Python interpreter in OCaml using ocamllex, and in order to handle the indentation-based syntax, I want to

tokenize the input using ocamllex
iterate through the list of lexed tokens and insert INDENT and DEDENT tokens as needed for the parser
parse this list into an AST

However, in ocamllex, the lexing step produces a lexbuf stream which can't be easily iterated through to do the indentation checking. Is there a good way to extract a list of tokens from lexbuf, i.e.

let lexbuf = (Lexing.from_channel stdin) in
let token_list = tokenize lexbuf

where token_list has type Parser.token list? My hack was to define a trivial parser like

tokenize: /* used by the parser to read the input into the indentation function */
  | token EOL { $1 @ [EOL] }
  | EOL { SEP :: [EOL] }

token:
  | COLON { [COLON] }
  | TAB { [TAB] }
  | RETURN { [RETURN] }
   ...
  | token token %prec RECURSE { $1 @ $2 }

and to call this like

    let lexbuf = (Lexing.from_channel stdin) in
    let temp = (Parser.tokenize Scanner.token) lexbuf in (* char buffer to token list *)

but this has all sorts of issues with shift-reduce errors and unnecessary complexity. Is there a better way to write a lexbuf -> Parser.token list function in OCaml?

Original Q&A

TechQA.

Extract Token List from OCamllex lexbuf

There are 0 answers

Related Questions in PARSING

Related Questions in OCAML

Related Questions in OCAMLLEX

Related Questions in OCAMLYACC

Popular Questions

Popular Tags

Trending Questions