Training Stanford RSS and Shift Reduce parsers for a new language

62 views Asked by At

I would like to train the constituency based Stanford Parsers (RSS and Shift Reduce) with an existing treebank, but cannot find enough information online to be able to do so. Two key questions:

  1. In what format should I export my treebank to be able to train each parser? (I notice that "Standard Treebank format" should be used for the SR parser - but I cannot find a specification to how this format looks. If it is the same format used by the PENN Treebank, how should trees be split up? in a single file according separated by newlines? In separate files?)

  2. I am attempting this programmatically by writing some Java code in an IDE. Assuming I now have the correct files, how would I go about training each of these parsers? Which method calls should be used in what order?

I cannot figure this out from the source code or Javadocs for each of these parsers. Any advice would be greatly appreciated.

0

There are 0 answers