How do I train one model for OpenNLP Name Entity from multiple files in DKPro Core?

107 views Asked by At

How do I train one model from multiple files in DKPro Core?

After annotate many documents in WebAnno and export in XMI format I tryed to create a model with this code:

        File model = new File("/tmp/", "model.bin");

    SimplePipeline.runPipeline(CollectionReaderFactory.createReaderDescription(XmiReader.class,
            ResourceCollectionReaderBase.PARAM_SOURCE_LOCATION, "/tmp/",
            ResourceCollectionReaderBase.PARAM_PATTERNS, ResourceCollectionReaderBase.INCLUDE_PREFIX + "*.xmi"),
            AnalysisEngineFactory.createEngineDescription(OpenNlpNamedEntityRecognizerTrainer.class,
                    OpenNlpNamedEntityRecognizerTrainer.PARAM_TARGET_LOCATION, model,
                    OpenNlpNamedEntityRecognizerTrainer.PARAM_LANGUAGE, "pt"));

}

The problem is that although it did open the multiple annotated files only one file was trained.

1

There are 1 answers

0
rec On

The reader opens all files and sends them one-by-one to the trainer. The trainer learns from all of them and produces a single output model. That is why you only see one output file.

If you wanted to create one model per input file, you'd have to create a loop which passes the files one-by-one to the reader.