I'm not able to run an uima ruta script in my simple pipeline. I'm working with the next libraries:
- Uimafit 2.0.0
- Uima-ruta 2.0.1
- ClearTK 1.4.1
- Maven
And I'm using a org.apache.uima.fit.pipeline.SimplePipeline
with:
SimplePipeline.runPipeline(
UriCollectionReader.getCollectionReaderFromDirectory(filesDirectory), //directory with text files
UriToDocumentTextAnnotator.getDescription(),
StanfordCoreNLPAnnotator.getDescription(),//stanford tokenize, ssplit, pos, lemma, ner, parse, dcoref
AnalysisEngineFactory.createEngineDescription(RUTA_ANALYSIS_ENGINE),//RUTA script
AnalysisEngineFactory.createEngineDescription(//
XWriter.class,
XWriter.PARAM_OUTPUT_DIRECTORY_NAME, outputDirectory,
XWriter.PARAM_FILE_NAMER_CLASS_NAME, ViewURIFileNamer.class.getName())
);
What I'm trying to do is to use the StandfordNLP annotator(from ClearTK) and apply a ruta script. Currently, everything runs without errors and the default ruta annotations are being added to the CAS, but the annotations that my rules create are not being added to the CAS.
My script is:
PACKAGE edu.isistan.carcha.concern;
TYPESYSTEM org.cleartk.ClearTKTypeSystem;
DECLARE persistence
Token{FEATURE("lemma","storage") -> MARK(persistence)};
Looking at the annotated file:
The basic ruta annotations like "SPACE" or "SW" are there, so the RutaEngine is being created and added to the pipeline...
How do I properly create an AnalysisEngineDescriptor to run a Ruta script?
Notes: RUTA_ANALYSIS_ENGINE Its the engine descriptor that I copy from the RUTA workbench.
Try to add a semi-column after the declaration and use a fully qualified name for the
Token
annotation :Type aliasing in RUTA is a little bit too aggressive. Every types known to your pipeline will be available by its short name, even if you do not import them in your script. If you have more than one
Token
types available to your pipeline, there is currently no way to know which one will be picked (see https://issues.apache.org/jira/browse/UIMA-3322?filter=-2).