I want to extract dates in the format Month Date Year.
For example: 14 January, 2005 or Feb 29 1982
the code im using: date = re.findall(r'\d{1,3} Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec|January|February|March|April|May|June|July|August|September|October|November|December \d{1,3}[, ]\d{4}',line)
python inteprets this as 1-2 digits and Jan or each of the months. So it would match with only "Feb" or "12 Jan", but not the rest of it
So how do I group ONLY the Months in a way where i can use the | only for the months but not the rest of the expression
Answering your question directly, you can make two regexps for your "Day Month Year" and "Month Day Year" formats, then check them separately.
You can try both out and see which one matches. However, you'll still need to inspect it and convert it to your favourite flavour of date format.
Instead, I would advise not using regexp at all, and simply try different date formats.
The function above will take a string and return a
datetime.datetime
object. You can use standarddatetime.datetime
methods to get your day, month and year back.