Custom rules for sutime (Stanford Temporal Tagger)

129 views Asked by At

I've been trying to add custom rules to SUTime and I'm stuck with one of them. I've tried various approaches and none of them seem to get the result I want.

5 - 8 years returns the following.

{
    'text'  : '5 - 8 years',
    'type'  : 'DURATION',
    'value' : 'P5Y/P8Y'
}

5-8 years return the following. (no space between the hyphens)


{
    'text'  : 'years',
    'type'  : 'DURATION',
    'value' : 'PXY'
}

This is my understanding
- '5-8' and 'years' are the tokens it produces
- 5-8 is not mapped to anything and therefore is not considered relevant

What I've tried:-

These are results on running them on '5-8 years'

{ text: /(\d+)[-](\d+)[\s]+($TEUnits)(s)?/ =>
    Duration( TIMEUNIT_MAP[Lowercase($3)], $2) }

    gives 'value': 'P8Y' as expected. I want 'P5Y/P8Y' as the value.
  • I tried various composite rules but finally got closest to the solution only by using the above mentioned text rule.
  • I tried using various form of Duration like Duration( $1, $2, TIMEUNIT_MAP[Lowercase($3)] ), Duration( $1, $2, TIMEUNIT_MAP[Lowercase($3)], TIME_UNKNOWN, TIME_REF ) and many other combinations even if they didn't make sense. They all resulted in the rule being ignored and therefore gave the same result.
  • I tried cleaning the data by replacing (digit-digit) by (digit - digit) but that led to problems in detecting dates.

Gist

How do I get value tags of the form P5Y/P8Y from the text "I've been reading from the last 5-8 years"?

I've put an honest effort and have gone through most of the rules file, accompanying documentation and 3 other Stack Overflow questions several times. I also got three other rules to work.

0

There are 0 answers