I have implemented the Earley parser using a queue to process states. The queue is seeded with the top-level rule. For each state in the queue, one of the operations (prediction, scanning, completion) is performed by adding new states to the queue. Duplicate states are not added.
The problem I am having is best described with the following grammar:
When parsing A
, the following happens:
As you can tell, A
will not be fully resolved. This is because the completion with the epsilon state will only happen once as it is not added to the queue.
How can I adapt my algorithm to support these epsilon-states?
Edit: Note that this is not an issue when using terminals as a new chart set will be created to insert the scanned state. As the state does not exist there already, it will be processed.
In the paper "Practical Earley Parsing" by John Aycock and R. Nigel Horspool, the authors propose the following as a way of handling nullable nonterminals:
(Emphasis in original.) So in your example, in the prediction of A→ • B B the following rules would be produced:
(1) B → •(2) A → B • B
(3) A → B B •
The key is this happens in the prediction phase. During the prediction phase if the 'post dot' symbol is nullable (both directly and through transference) then move the dot right and add that rule as well.
So basically:
A → • B B produces (B → • and A → B • B) each being queued and processed
A → B • B produces (A → B B •) which is queued and processed