dateparser search_dates returns impossible year

25 views Asked by At

I'm using dateparser's "search_dates" to parse text for dates and got a strange date in my result.

dateparser.__version__
'1.1.8'


settings= {
 'RELATIVE_BASE': datetime.datetime(2023, 7, 31, 0, 0),
 'PREFER_DAY_OF_MONTH': 'first',
 'PREFER_DATES_FROM': 'future',
 'REQUIRE_PARTS': ['year', 'month'],
 'DATE_ORDER': 'YMD'
}
s = 'Closing Yield, 2010 Year Treasury notes On Dec 31, 2023'
search_dates(s, settings=settings)

Result:

Out[27]: 
[('2010 Year', datetime.datetime(4033, 7, 31, 0, 0)),
 ('On Dec 31, 2023', datetime.datetime(2023, 12, 31, 0, 0))]

The first item in the list yields an impossible result (year = 4033).

Any ideas here?

1

There are 1 answers

0
leeprevost On

with help from @Malcolm and contributor via Github:

This is because year is interpreted the same as years, and “2010 years” is interpreted as “2010 years later“.

Maybe we could make it so that if it is year, singular, it only works like that for “1 year”, and otherwise it gets translated to “year 2010” for example. But it may not be trivial to address.