I have a lot of parse trees like this:
( S ( NP-SBJ ( PRP I ) ) ( INODE@S ( VP ( VBP have ) ( NP ( DT a ) ( INODE@NP ( NN savings ) ( NN account ) ) ) ) ( . . ) ) )
for a sentence like this: "I have a savings account ."
I need to extract all derivation rules from these trees. The derivation rules like:
S -> NP-SBJ INODE@S
NP-SBJ -> PRP
PRP -> I
INODE@S -> VP NP
and so on.
Is there any prepared code (preferably in java) or pseudo code for this purpose?
Edit:
I think this problem is very general and common in many areas. The simplified problem is to find each parent and it's children from a parenthesis tree.
I wrote this for python. I believe you can read it as a pseudocode.
I will edit the post for Java later. I added the Java implementation later.In short, start from simplest grammar you can find and replace them with their left-hand-side and continue. e.g. find
(PRP I)
save it, then replace it withPRP
. repeat until you find all the syntax.Update: The Java implementation is a bit different but it's the same idea. The full code is here: http://ideone.com/0eE8bd
the output: