I'm following the official instructions for adding custom SUTime rules for fiscal year quarters (stuff like Q1, Q2, Q3 and Q4).
I used the default defs.sutime.txt and english.sutime.txt as templates for my own rule files.
After appending the following code to my defs.sutime.txt
// Financial Quarters
FYQ1 = {
type: QUARTER_OF_YEAR,
label: "FYQ1",
value: TimeWithRange(TimeRange(IsoDate(ANY,10,1), IsoDate(ANY,12,31), QUARTER))
}
FYQ2 = {
type: QUARTER_OF_YEAR,
label: "FYQ2",
value: TimeWithRange(TimeRange(IsoDate(ANY,1,1), IsoDate(ANY,3,31), QUARTER))
}
FYQ3 = {
type: QUARTER_OF_YEAR,
label: "FYQ3",
value: TimeWithRange(TimeRange(IsoDate(ANY,4,1), IsoDate(ANY,6,30), QUARTER))
}
FYQ4 = {
type: QUARTER_OF_YEAR,
label: "FYQ4",
value: TimeWithRange(TimeRange(IsoDate(ANY,7,1), IsoDate(ANY,9,30), QUARTER))
}
and appending the following code to my english.sutime.txt
# Financial Quarters
FISCAL_YEAR_QUARTER_MAP = {
"Q1": FYQ1,
"Q2": FYQ2,
"Q3": FYQ3,
"Q4": FYQ4
}
FISCAL_YEAR_QUARTER_YEAR_OFFSETS_MAP = {
"Q1": 1,
"Q2": 0,
"Q3": 0,
"Q4": 0
}
$FiscalYearQuarterTerm = CreateRegex(Keys(FISCAL_YEAR_QUARTER_MAP))
{
matchWithResults: TRUE,
pattern: ((/$FiscalYearQuarterTerm/) (FY)? (/(FY)?([0-9]{4})/)),
result: TemporalCompose(INTERSECT, IsoDate(Subtract({type: "NUMBER", value: $$3.matchResults[0].word.group(2)}, FISCAL_YEAR_QUARTER_YEAR_OFFSETS_MAP[$1[0].word]), ANY, ANY), FISCAL_YEAR_QUARTER_MAP[$1[0].word])
}
{
pattern: ((/$FiscalYearQuarterTerm/)),
result: FISCAL_YEAR_QUARTER_MAP[$1[0].word]
}
I'm still unable to correctly parse stuff like "Q1 2020".
How can I properly add rules for parsing fiscal year quarters (e.g. "Q1")?
Here's my full code:
import java.util.List;
import java.util.Properties;
import edu.stanford.nlp.ling.CoreAnnotations;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.time.*;
import edu.stanford.nlp.util.CoreMap;
public class SUTimeSoExample {
public static void main(String[] args) {
Properties props = new Properties();
props.setProperty("sutime.includeRange", "true");
props.setProperty("sutime.markTimeRanges", "true");
props.setProperty("sutime.rules", "./defs.sutime.txt,./english.sutime.txt");
AnnotationPipeline pipeline = new AnnotationPipeline();
pipeline.addAnnotator(new TokenizerAnnotator(false));
pipeline.addAnnotator(new WordsToSentencesAnnotator(false));
pipeline.addAnnotator(new POSTaggerAnnotator(false));
pipeline.addAnnotator(new TimeAnnotator("sutime", props));
String input = "Stuff for Q1 2020";
Annotation annotation = new Annotation(input);
annotation.set(CoreAnnotations.DocDateAnnotation.class, "2020-06-01");
pipeline.annotate(annotation);
System.out.println(annotation.get(CoreAnnotations.TextAnnotation.class));
List<CoreMap> timexAnnsAll = annotation.get(TimeAnnotations.TimexAnnotations.class);
for (CoreMap cm : timexAnnsAll) {
System.out.println(cm // match
+ " --> " + cm.get(TimeExpression.Annotation.class).getTemporal() // parsed value
);
}
}
}
Note that I deleted the deafult defs.sutime.txt and english.sutime.txt files from the stanford corenlp models JAR in order to avoid this issue.
There is a Java code example here:
https://stanfordnlp.github.io/CoreNLP/sutime.html
It should work if you follow that example, mainly building your pipeline in this manner:
and make sure to use version 4.0.0.
You can set
ner.rulesOnlyto true if you just want to run SUTime without running the statistical models.You can use one of several properties for
ner.docDateor just set the document date in the annotation before running.