I'm learning UIMA, and I can create basic analysis engines and get results. But What I'm finding it difficult to understand is use of CAS consumers. At the same time I want to know how different it is from AnalysisEngine? From many examples I have seen, CAS consumer is not really needed(?). Is CAS consumer is very important from big applications point of view or can we do without it?
What exactly is the difference between AnalysisEngine and CAS Consumer?
407 views Asked by tired and bored dev AtThere are 3 answers
The main difference is that by default analysis engines are configured to allow being run in parallel so that they may see only some CASes each (OperationalProperties multipleDeploymentAllowed = true).
CAS consumers are configured to disallow being run in parallel, meaning that they will see all CASes (OperationalProperties multipleDeploymentAllowed = false).
The latter is necessary, e.g. when you want to write all results to a single file.
E.g. the CPE engine respects this flag. When configured for multi-threaded execution, CPE will keep multiple parallel instances of all analysis engines until it hits the first one in the pipeline with multipleDeploymentAllowed = false, which is usually a consumer. For all following components (analysis engines, consumers) only a single instance is created.
Disclosure: I'm on the Apache UIMA project.
There is no difference between them in the current version. Historically, a CASConsumer would tipically not modify the CAS, but only use the data existing in the CAS (previously added by an Analysis Engine) to aggregate it/prepare it for use in other systems, e.g., ingestion in databases.
In the current version, it is recommended that CASConsumers be replaced by Analysis Engine components.