How do I train one model from multiple files in DKPro Core?
After annotate many documents in WebAnno and export in XMI format I tryed to create a model with this code:
File model = new File("/tmp/", "model.bin");
SimplePipeline.runPipeline(CollectionReaderFactory.createReaderDescription(XmiReader.class,
ResourceCollectionReaderBase.PARAM_SOURCE_LOCATION, "/tmp/",
ResourceCollectionReaderBase.PARAM_PATTERNS, ResourceCollectionReaderBase.INCLUDE_PREFIX + "*.xmi"),
AnalysisEngineFactory.createEngineDescription(OpenNlpNamedEntityRecognizerTrainer.class,
OpenNlpNamedEntityRecognizerTrainer.PARAM_TARGET_LOCATION, model,
OpenNlpNamedEntityRecognizerTrainer.PARAM_LANGUAGE, "pt"));
}
The problem is that although it did open the multiple annotated files only one file was trained.
The reader opens all files and sends them one-by-one to the trainer. The trainer learns from all of them and produces a single output model. That is why you only see one output file.
If you wanted to create one model per input file, you'd have to create a loop which passes the files one-by-one to the reader.