I'm trying to embed Apache cTAKES NLP logic into my application.
First of all, I'm unable to find any good documentation in order to be educated how it can be done.
From the different pieces of code that I found on the internet I have created the following test code:
public class CTAKESTest {
public static void main(String[] args) throws UIMAException, MalformedURLException {
final String note = "Serum Cholesterol 154 150 250 mgs/dl\n-\nSerum Triglycerides 67 90 200 mgs /dl\n-\nSerum HDL: Cholesterol 38 35 55 mgs /dl\n-\nSerum LDL: Cholesterol 49 85 150 mgs/d1\n-\nSerum VLDL: Cholesterol 13 10 40 mgs/dl\n-\nTotal Cholesterol / HDL Ratio: 3.90";
final JCas jcas = JCasFactory.createJCas();
jcas.setDocumentText(note);
final AnalysisEngineDescription aed = getFastPipeline();
SimplePipeline.runPipeline(jcas, aed);
Collection<TOP> codes = JCasUtil.selectAll(jcas);
List<TOP> list = new ArrayList(codes);
TOP[] res = list.toArray(new TOP[list.size()]);
// System.out.println(Arrays.toString(res));
String json = new Gson().toJson(res);
System.out.println(json);
}
public static AnalysisEngineDescription getFastPipeline()
throws ResourceInitializationException, MalformedURLException {
AggregateBuilder builder = new AggregateBuilder();
builder.add(getTokenProcessingPipeline());
builder.add(DefaultJCasTermAnnotator.createAnnotatorDescription());
builder.add(ClearNLPDependencyParserAE.createAnnotatorDescription());
builder.add(PolarityCleartkAnalysisEngine.createAnnotatorDescription());
builder.add(UncertaintyCleartkAnalysisEngine.createAnnotatorDescription());
builder.add(HistoryCleartkAnalysisEngine.createAnnotatorDescription());
builder.add(ConditionalCleartkAnalysisEngine.createAnnotatorDescription());
builder.add(GenericCleartkAnalysisEngine.createAnnotatorDescription());
builder.add(SubjectCleartkAnalysisEngine.createAnnotatorDescription());
return builder.createAggregateDescription();
}
public static AnalysisEngineDescription getTokenProcessingPipeline()
throws ResourceInitializationException, MalformedURLException {
AggregateBuilder builder = new AggregateBuilder();
builder.add(SimpleSegmentAnnotator.createAnnotatorDescription());
builder.add(SentenceDetector.createAnnotatorDescription());
builder.add(TokenizerAnnotatorPTB.createAnnotatorDescription());
builder.add(LvgAnnotator.createAnnotatorDescription());
builder.add(ContextDependentTokenizerAnnotator.createAnnotatorDescription());
builder.add(POSTagger.createAnnotatorDescription());
return builder.createAggregateDescription();
}
}
but it fails during startup with the following error:
08:37:01.978 [main] INFO o.apache.ctakes.lvg.ae.LvgAnnotator - URL for lvg.properties =file:/C:/Users/Alex/.m2/repository/net/sourceforge/ctakesresources/ctakes-resources-lvg2008/4.0.0/ctakes-resources-lvg2008-4.0.0.jar!/org/apache/ctakes/lvg/data/config/lvg.properties
08:37:03.454 [main] INFO o.a.ctakes.core.ae.SentenceDetector - Sentence detector model file: org/apache/ctakes/core/sentdetect/sd-med-model.zip
08:37:03.566 [main] INFO o.a.c.core.ae.TokenizerAnnotatorPTB - Initializing org.apache.ctakes.core.ae.TokenizerAnnotatorPTB
Exception in thread "main" java.lang.IllegalArgumentException: URI is not hierarchical
at java.io.File.<init>(Unknown Source)
at org.apache.ctakes.lvg.resource.LvgCmdApiResourceImpl.load(LvgCmdApiResourceImpl.java:65)
at org.apache.uima.resource.impl.ResourceManager_impl.registerResource(ResourceManager_impl.java:628)
at org.apache.uima.resource.impl.ResourceManager_impl.initializeExternalResources(ResourceManager_impl.java:464)
at org.apache.uima.resource.Resource_ImplBase.initialize(Resource_ImplBase.java:193)
at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.initialize(AnalysisEngineImplBase.java:157)
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:131)
at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:407)
at org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:256)
at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:429)
at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:373)
at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:186)
at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:407)
at org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:256)
at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:429)
at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:373)
at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:186)
at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:407)
at org.apache.uima.analysis_engine.asb.impl.ASB_impl.setup(ASB_impl.java:256)
at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initASB(AggregateAnalysisEngine_impl.java:429)
at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initializeAggregateAnalysisEngine(AggregateAnalysisEngine_impl.java:373)
at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.initialize(AggregateAnalysisEngine_impl.java:186)
at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:279)
at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:331)
at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:448)
at org.apache.uima.fit.factory.AnalysisEngineFactory.createEngine(AnalysisEngineFactory.java:205)
at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:227)
at org.apache.uima.fit.pipeline.SimplePipeline.runPipeline(SimplePipeline.java:260)
What am I doing wrong and how to fix it ? Also, how to properly configure cTAKES in order to use AggregatePlaintextFastUMLSProcessor.xml
and my custom dictionary that I'm going to create also ?
I would request you to have a look at this cTAKES-REST module that meets your exact requirement. It can be invoked using a web service call and this can also be configured to make use of your custom dictionary too.