For the text
level 110 KwH
I've got the rule
W{REGEXP("level")} NUM{-> MARK(energy_consumption)} W{REGEXP("KwH")}
but it doesn't work.
The text level 110 KH
with the rule
W{REGEXP("level")} NUM{-> MARK(energy_consumption)} W{REGEXP("KH")}
works. Why doesn't work the 1st rule?
The rule does not match, because "KwH" is not one
W
annotation but twoW
annotations (actuallyCW
).KH
is only one W annotation (actuallyCAP
). TheREGEXP
condition matches on the covered text of the matched annotation (which is "Kw"), thus the condition/regex is never fulfilled.You maybe want to consider using a tokenizer additionally to the ruta seeder, or some dictionary lookup, or some rules combining the two CW annotations. The initial annotations provied by Ruta are just a starting point but no real tokens.
DISCLAIMER: I ama developer of UIMA Ruta