How to custom tag word(s) in GATE JAPE grammar?

1.7k views Asked by At

I have a set of documents and each document has different heading. Example if document heading says "Psychological Evaluation" I want to tag the document as "Medicalrule".

  1. I loaded the document and loaded ANNIE with defaults.
  2. In Processing Resources > New > Jape Transducer 2.1 wrote the following code in the text document and saved it as .JAPE extension

CODE :


Phase: ConjunctionIdentifier
Input: Token Split    
Rule: Medicalrule
(
({Token.string=="Psychological"})+({Token.string == " "})+ ({Token.string == "Evaluation"}):Meddoc({Token.kind=="word"})
)

--> 
:Meddoc
  {
    gate.AnnotationSet matchedAnns= (gate.AnnotationSet) bindings.get("Meddoc"); gate.FeatureMap newFeatures= Factory.newFeatureMap();newFeatures.put("rule","Medicalrule");annotations.add(matchedAnns.firstNode(),matchedAnns.lastNode(),"CC", newFeatures);
 }

  1. Loaded the above created .JAPE file and reinitialized

After the application is run the Annotation Set does not show the tag !

Am I doing wrong somewhere ?It would be great if someone could help me on this.

Appreciate your time.

Thank you

2

There are 2 answers

1
Ian Roberts On BEST ANSWER

There are three issues I can see here.

  • First, as ashingel says, spaces are not represented as Token annotations - this is deliberate as in most cases you don't care about the spacing between words, only the words themselves.
  • Second, the trailing ({Token.kind=="word"}) means that the rule will only match when "Psychological Evaluation" is followed by another word before the end of the current sentence (because you've got Split in the Input line).
  • Third, you're only binding the Meddoc label to the "Evaluation" token, not to the whole match.

I would try and simplify the LHS of the rule:

Phase: ConjunctionIdentifier
Input: Token Split    
Rule: Medicalrule
(
  {Token.string=="Psychological"}
  {Token.string == "Evaluation"}
):meddoc

and for the RHS (a) you don't need to do the explicit bindings.get because you've used a labelled block so you already have the bound annots available, (b) you should use outputAS instead of annotations, and (c) you should generally avoid the add method that takes nodes, as it isn't safe if the input and output annotation sets are different. If you're using a recent snapshot of GATE then the gate.Utils static methods can help you a lot here

:meddoc {
    Utils.addAnn(outputAS, meddocAnnots,"CC",
                 Utils.featureMap("rule","Medicalrule"));
}

If you're using 7.1 or earlier then the addAnn method isn't available so it's slightly more convoluted:

:meddoc {
  try {
    outputAS.add(Utils.start(meddocAnnots), Utils.end(meddocAnnots),"CC",
                 Utils.featureMap("rule","Medicalrule"));
  } catch(InvalidOffsetException e) { // can't happen, but won't compile without
    throw new JapeException(e);
  }
}

Finally, just to check, you did definitely add your new JAPE Transducer PR to the end of the pipeline?

0
ashingel On

I'm sure that there is no annotation like: Token.string == " ". Try to use a SpaceToken annotation instead. Also, why not to try gazetteers instead of hardcoding of texts values in to JAPE code?