Feed ocamlyacc parser from explicit token list?

1.2k views Asked by At

Is it possible to feed an OCamlYacc-generated parser an explicit token list for analysis?

I'd like to use OCamlLex to explicitly generate a token list which I then analyze using a Yacc-generated parser later. However, the standard use case generates a parser that calls a lexer implicitly for the next token. Here tokens are computed during the yacc analysis rather than before. Conceptually a parser should only work on tokens but a Yacc-generated parser provides an interface that relies on a lexer which in my case I don't need.

3

There are 3 answers

1
Victor Nicollet On BEST ANSWER

If you already have a list of tokens, you can just go the ugly way and ignore the lexing buffer altogether. After all, the parse-from-lexbuf function that your parser expects is a non-pure function :

let my_tokens = ref [ (* WHATEVER *) ]
let token lexbuf = 
  match !my_tokens with 
    | []     -> EOF 
    | h :: t -> my_tokens := t ; h 

let ast = Parser.parse token (Lexbuf.from_string "")

On the other hand, it looks from your comments that you actually have a function of type Lexing.lexbuf -> token list that you're trying to fit into the Lexing.lexbuf -> token signature of your parser. If that is the case, you can easily use a queue to write a converter between the two types:

let deflate token = 
  let q = Queue.create () in
  fun lexbuf -> 
    if not (Queue.is_empty q) then Queue.pop q else   
      match token lexbuf with 
        | [   ] -> EOF 
        | [tok] -> tok
        | hd::t -> List.iter (fun tok -> Queue.add tok q) t ; hd 

let ast = Parser.parse (deflate my_lexer) lexbuf
1
gasche On

As already mentioned by Jeffrey, Menhir specifically offers, as part of its runtime library, a module to the parsers with any kind of token stream (it just asks for a unit -> token function): MenhirLib.Convert.

(You could even use this code without using Menhir, with ocamlyacc instead. In practice the conversion is not terribly complicated so you could even re-implement it yourself.)

1
Jeffrey Scofield On

The OCamlYacc interface does look pretty complicated; it seems to require a Lexing.lexbuf. Maybe you could consider using Lexing.from_string to feed a fixed string rather than a fixed sequence of tokens. You could also look at Menhir. I haven't used it, but it gets excellent reviews here whenever anybody mentions OCaml parser generators. It might have a more flexible lexing interface.