php date_parse("Feb 2010") gives day == 1

1.6k views Asked by At

There is what I would call a bug in date_parse when there is no day. $d = date_parse("Feb 2010") will give $d["day"] == 1.

See the comment on this on the date_parse manual page.

Any nice workaround for this problem? :-)

UPDATE The date comes from published research reports. Unfortunately this means that they could look in different ways. I want to convert them to more standard ISO format when displaying the references. To help the readers I want always to include just the given fields (years, month, date). So this should be valid (and just give me the year):

2010

This should be valid, but just give me 2010-02 so to say:

Feb 2010

UPDATE 2 So far I have seen two bugs here in date_parse. It can't parse 2010. And it gives a day though there is no day in Feb 2010.

I can of course write a fix for this, but surely someone has already done this, or???

3

There are 3 answers

0
Leo On BEST ANSWER

No answers so I answer my own question. Here is a workaround the problems I saw.

// Work around for some bugs in date_parse (tested in PHP 5.5.19)
//   http://php.net/manual/en/function.date-parse.php
//
// Date formats that are cannot be parsed correctly withoug this fix:
//   1) "2014" - Valid ISO 8061 date format but not recognized by date_parse.
//   2) "Feb 2010" - Parsed but gives ["day"] => 1.
function date_parse_5_5_bugfix($dateRaw) {
  // Check "2014" bug:
  $dateRaw = rtrim($dateRaw);
  $dateRaw = ltrim($dateRaw);
  if (strlen($dateRaw) === 4 && preg_match("/\d{4}/", $dateRaw) === 1) {
    $da = date_parse($dateRaw . "-01-01");
    $da["month"] = false;
    $da["day"] = false;
  } else {
    $da = date_parse($dateRaw);
    if ($da) {
      if (array_key_exists("year", $da)
          && array_key_exists("month", $da)
          && array_key_exists("day", $da))
        {
          if ($da["day"] === 1) {
            // Check "Feb 2010" bug:
            // http://www.phpliveregex.com/
            if (preg_match("/\b0?1(?:\b|T)/", $dateRaw) !== 1) {
              $da["day"] = false;
            }
          }
        }
    }
  }
  return $da;
}

Some tests (visual ;-) )

$a = date_parse_5_5_bugfix("2014"); print_r($a);
$b = date_parse_5_5_bugfix("feb 2010"); print_r($b);
$c = date_parse_5_5_bugfix("2014-01-01"); print_r($c);
$d = date_parse_5_5_bugfix("2014-11-01T06:43:08Z"); print_r($d);
$e = date_parse_5_5_bugfix("2014-11-01x06:43:08Z"); print_r($e);
7
happyvirus On

Can you try:

$dateTime = strtotime('February, 2010');
echo date('Y-m', $dateTime);
0
Scott McDermott On

The above bugfix routine is great, Leo, thanks. Unfortunately it still trips over January, thinking that 2014-01 is the same as 2014-01-01 --- we're eleven-twelfths of the way there.

The date formats that PHP can parse, that don't contain a day-of-month, appear to be (in php_src:date/lib/parse_date.re):

gnudateshorter   = year4 "-" month;
datenoday        = monthtext ([ .\t-])* year4;
datenodayrev     = year4 ([ .\t-])* monthtext;

Very few, conveniently. We can run the same regexes on $dateRaw, essentially reverse-engineering what the parser had decided.

(Side observations: the above excludes formats like 5/2016, which is parsed as "20 May with some extra characters at the end"; they are also similar to day-of-year and week-of-year formats, so we'll try not to trip over those.)

function date_parse_bugfix($dateRaw) {
  $dateRaw = trim($dateRaw);
  // Check for just-the-year:
  if (strlen($dateRaw) === 4 && preg_match("/\d{4}/", $dateRaw) === 1) {
    $da = date_parse($dateRaw . "-01-01");
    $da["month"] = false;
    $da["day"] = false;
  }
  else {
    $da = date_parse($dateRaw);
    if ($da) {
      // If we have a suspicious "day 1", check for the three formats above:
      if ($da["day"] === 1) {
        // Hat tip to http://regex101.com
        // We're not actually matching to monthtext (which is looooong), 
        // just looking for alphabetic characters
        if ((preg_match("/^\d{4}\-(0?[0-9]|1[0-2])$/", $dateRaw) === 1) ||
            (preg_match("/^[a-zA-Z]+[ .\t-]*\d{4}$/", $dateRaw) === 1) ||
            (preg_match("/^\d{4}[ .\t-]*[a-zA-Z]+$/", $dateRaw) === 1)) {
              $da["day"] = false;
        }
      }
    }
  }
  return $da;
}