For example, for PRESENT_REF I would need to get “Monday, April 27, 2015 14:22 PM”
I experimented with HeidelTime (code below) on simple sentences, like "In three hours from now I will finish this program".
HeidelTimeStandalone heidelTime = new HeidelTimeStandalone(
Language.ENGLISH,
DocumentType.NEWS,
OutputType.TIMEML,
"C:/heideltime/heideltime-standalone/config.props",
POSTagger.TREETAGGER, true);
// Document creation time
Date dct = new Date();
String text="In three hours from now I will finish this program.";;
String result = heidelTime.process(text, dct)
For this specific one, HeidelTime produces the annotations
<?xml version="1.0"?>
<!DOCTYPE TimeML SYSTEM "TimeML.dtd">
<TimeML>
In <TIMEX3 tid="t2" type="DURATION" value="PT3H">three hours</TIMEX3> from <TIMEX3 tid="t1" type="DATE" value="PRESENT_REF">now</TIMEX3> we will finish this program
</TimeML>
while I would need to get something like
At <TIMEX3 tid="t6" type="DATE" value="2015-04-27"> <TIMEX3 tid="t8" type="TIME" value="2015-04-27T26:22">17:22 PM</TIMEX3> I will finish this program
Is there a way to achieve this ?
HeidelTime [1] tries to extract and normalize temporal expressions following the TimeML guidelines [2], with a focus on the attributes "type" and "value".
Regarding your first example: expressions such as "now" are to be annotated as "PRESENT_REF" following TimeML so that HeidelTime's annotation is not incorrect - although probably not useful in your case. You say:
If you want to "translate" "PRESENT_REF" into an actual value, you can assume that PRESENT_REF always refers to the document creation time of an article (although this might be incorrect in some cases, in particular if you are not processing news-style documents). Anyway, you can use a DateFormatter to get the information you want:
Your second example is less straightforward. Considering the extents of temporal expressions and the attributes type and value, the annotations for the expression that are created by HeidelTime are correct, i.e.,
Sometimes, further annotations are desireable. For instance, following TimeML, it is possible to anchor durations and to assign "beginPoint" and/or "endPoint" information to duration annotations. HeidelTime does not do that unfortnuately.
However, for some kinds of expressions, HeidelTime can add annotations for non-standard TIMEX3s, in particular if two expressions desribe a time interval, e.g., for the phrase "from 1910 to 1950", the standard TIMEX3 annotations are:
If you use the HeidelTime's interval tagger additionally, a TIMEX3INTERVAL is added, which contains the earliest and latest starting and end point of the interval, i.e.,
Time intervals are thus covered, but calculated values resulting of multiple simple TIMEX3 expressions are not yet supported.
If you want to write an extension covering that, you can start with the DateFormatter example above, parse the duration values (e.g., PT3H) and perform a DateCalculation such as
c.add(Calendar.HOUR, 3)
, which adds three hours to the original c.If you write an extension and want to add it to HeidelTime, let us know ;-)
[1] https://github.com/HeidelTime/heideltime
[2] http://timeml.org/publications/timeMLdocs/annguide_1.2.1.pdf