Annotator dependencies: UIMA Type Capabilities?

263 views Asked by At

In my UIMA application, I have some annotators that must run after a certain annotator has run.

At first, I thought about aggregating these annotators together, but I have other annotators that are also dependent on this (and other) annotator, which makes aggregating hard and/or unpractical.

I read about Type Capabilities, which, if I understood correctly, tell UIMA that certain Types (Annotations) must be present, when used as Input Type. I was hoping that UIMA will give me something like a warning when running pipelines without Annotators that have some Output Types that are declared as Input Types of the used Annotator. Instead, the annotators keep running/processing as usual.

Is there a way to achieve what I want, or is this just unnecessary? I am currently using the SimplePipeline if that matters.

TL;DR: My goal is that Annotators refuse to run if there are certain other Annotators are missing in the Pipeline/scheduled after these Annotators

Thanks in advance.

3

There are 3 answers

7
ozborn On BEST ANSWER

In UIMA the primary way in which one assures that the annotations you need are present for your annotator is to aggregate annotators together. So to answer your question, that is how you are going to achieve what you want because what you want to do (have UIMA figure our all your dependencies through Type Capabilities and provide warnings) is neither possible nor practical with a collection of stand alone annotators.

My question back to you is, why is it so hard to figure our dependencies by making aggregate annotators? Do you realize that you can aggregate aggregate pipelines? They should have all of their annotation dependencies intact if they are functioning correctly and provide a list of what they output so you can use them as input for the construction of your own pipeline.

1
rec On

UIMA is a very flexible framework. By default, capabilities are not used and if present are purely informational. However, check out e.g. the CapabilityLanguageFlowController:

FlowController for the CapabilityLanguageFlow, which uses a linear flow 
but may skip some of the AEs in the flow if they do not handle the language
of the current document or if their outputs have already been produced by
a previous AE in the flow.

You could configure your aggregate to use such a flow controller and then you can benefit from capabilities.

Disclosure: I am working on the Apache UIMA project (but I didn't use the CapabilityLanguageFlowController so far)

1
Renaud On

UIMA does not enforce that certain Types (Annotations) must be present. This is a feature that allows flexibility.

You can however document your UIMAfit annotator with @TypeCapability annotation. A typical use of this annotation might look something like:

@TypeCapability(
  inputs="org.apache.uima.fit.type.Token", 
  outputs="org.apache.uima.fit.type.Token:pos")

Still, this does not enforce type dependencies. What I did on one project was to implement my own checking system on top of UIMA.