I am writing a Python interpreter in OCaml using ocamllex, and in order to handle the indentation-based syntax, I want to
- tokenize the input using ocamllex
- iterate through the list of lexed tokens and insert INDENT and DEDENT tokens as needed for the parser
- parse this list into an AST
However, in ocamllex, the lexing step produces a lexbuf stream which can't be easily iterated through to do the indentation checking. Is there a good way to extract a list of tokens from lexbuf, i.e.
let lexbuf = (Lexing.from_channel stdin) in
let token_list = tokenize lexbuf
where token_list has type Parser.token list? My hack was to define a trivial parser like
tokenize: /* used by the parser to read the input into the indentation function */
| token EOL { $1 @ [EOL] }
| EOL { SEP :: [EOL] }
token:
| COLON { [COLON] }
| TAB { [TAB] }
| RETURN { [RETURN] }
...
| token token %prec RECURSE { $1 @ $2 }
and to call this like
let lexbuf = (Lexing.from_channel stdin) in
let temp = (Parser.tokenize Scanner.token) lexbuf in (* char buffer to token list *)
but this has all sorts of issues with shift-reduce errors and unnecessary complexity. Is there a better way to write a lexbuf -> Parser.token list function in OCaml?