UIMA ruta - why does it not work?

111 views Asked by At

For the text

level 110 KwH

I've got the rule

W{REGEXP("level")} NUM{-> MARK(energy_consumption)} W{REGEXP("KwH")}

but it doesn't work.

The text level 110 KH with the rule

 W{REGEXP("level")} NUM{-> MARK(energy_consumption)} W{REGEXP("KH")}  

works. Why doesn't work the 1st rule?

1

There are 1 answers

0
Peter Kluegl On

The rule does not match, because "KwH" is not one W annotation but two W annotations (actually CW). KH is only one W annotation (actually CAP). The REGEXP condition matches on the covered text of the matched annotation (which is "Kw"), thus the condition/regex is never fulfilled.

You maybe want to consider using a tokenizer additionally to the ruta seeder, or some dictionary lookup, or some rules combining the two CW annotations. The initial annotations provied by Ruta are just a starting point but no real tokens.

DISCLAIMER: I ama developer of UIMA Ruta