Is there a way to convert TIMEX3 words to their actual values?

1.3k views Asked by At

For example, for PRESENT_REF I would need to get “Monday, April 27, 2015 14:22 PM”

I experimented with HeidelTime (code below) on simple sentences, like "In three hours from now I will finish this program".

HeidelTimeStandalone heidelTime = new HeidelTimeStandalone(
        Language.ENGLISH,
        DocumentType.NEWS,
        OutputType.TIMEML,
        "C:/heideltime/heideltime-standalone/config.props", 
        POSTagger.TREETAGGER, true);

// Document creation time 
Date dct = new Date();  
String text="In three hours from now I will finish this program.";;
String result = heidelTime.process(text, dct)

For this specific one, HeidelTime produces the annotations

<?xml version="1.0"?>
<!DOCTYPE TimeML SYSTEM "TimeML.dtd">
<TimeML>
In <TIMEX3 tid="t2" type="DURATION" value="PT3H">three hours</TIMEX3> from <TIMEX3 tid="t1" type="DATE" value="PRESENT_REF">now</TIMEX3> we will finish this program
</TimeML>

while I would need to get something like

At <TIMEX3 tid="t6" type="DATE" value="2015-04-27">   <TIMEX3 tid="t8" type="TIME" value="2015-04-27T26:22">17:22 PM</TIMEX3> I will finish this program

Is there a way to achieve this ?

1

There are 1 answers

0
Jannik On

HeidelTime [1] tries to extract and normalize temporal expressions following the TimeML guidelines [2], with a focus on the attributes "type" and "value".

Regarding your first example: expressions such as "now" are to be annotated as "PRESENT_REF" following TimeML so that HeidelTime's annotation is not incorrect - although probably not useful in your case. You say:

for PRESENT_REF I would need to get “Monday, April 27, 2015 14:22 PM”

If you want to "translate" "PRESENT_REF" into an actual value, you can assume that PRESENT_REF always refers to the document creation time of an article (although this might be incorrect in some cases, in particular if you are not processing news-style documents). Anyway, you can use a DateFormatter to get the information you want:

import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Calendar;
...
Calendar c = Calendar.getInstance();
String dct = "2015-04-27T14:22";
SimpleDateFormat formatIn = new SimpleDateFormat("yyyy-MM-dd'T'hh:mm");
c.setTime(formatIn.parse(dct));
SimpleDateFormat formatOut = new SimpleDateFormat("EEEE, MMMM dd, yyyy HH:mm a");
String dctText = formatOut.format(c.Time());
System.out.println(dctText);
// prints: Monday, April 27, 2015 14:22 PM

Your second example is less straightforward. Considering the extents of temporal expressions and the attributes type and value, the annotations for the expression that are created by HeidelTime are correct, i.e.,

<TIMEX3 tid="t1" type="DURATION" value="PT3H">three hours</TIMEX3> 
from
<TIMEX3 tid="t2" type="DATE" value="PRESENT_REF">now</TIMEX3>

Sometimes, further annotations are desireable. For instance, following TimeML, it is possible to anchor durations and to assign "beginPoint" and/or "endPoint" information to duration annotations. HeidelTime does not do that unfortnuately.

However, for some kinds of expressions, HeidelTime can add annotations for non-standard TIMEX3s, in particular if two expressions desribe a time interval, e.g., for the phrase "from 1910 to 1950", the standard TIMEX3 annotations are:

from <TIMEX3 tid="t1">1910</TIMEX3> to <TIMEX3 tid="t2">1950</TIMEX3>

If you use the HeidelTime's interval tagger additionally, a TIMEX3INTERVAL is added, which contains the earliest and latest starting and end point of the interval, i.e.,

<TIMEX3INTERVAL earliestBegin="1910-01-01T00:00:00" 
                latestBegin="1910-12-31T23:59:59"
                earliestEnd="1950-01-01T00:00:00" 
                latestEnd="1950-12-31T23:59:59">
<TIMEX3 tid="t1" type="DATE" value="1910">1910</TIMEX3> 
to 
<TIMEX3 tid="t2" type="DATE" value="1950">1950</TIMEX3>
</TIMEX3INTERVAL>

Time intervals are thus covered, but calculated values resulting of multiple simple TIMEX3 expressions are not yet supported.

If you want to write an extension covering that, you can start with the DateFormatter example above, parse the duration values (e.g., PT3H) and perform a DateCalculation such as c.add(Calendar.HOUR, 3), which adds three hours to the original c.

If you write an extension and want to add it to HeidelTime, let us know ;-)

[1] https://github.com/HeidelTime/heideltime

[2] http://timeml.org/publications/timeMLdocs/annguide_1.2.1.pdf