Is "00" a legal month and/or day value for XML Schema date datatype?

1k views Asked by At

I'm designing an XML bibliography and thinking about how to capture publishing dates. For most of the works I'm dealing with (books), the publishing date consists only of the year, but for some (journal articles) it's year and month, and for others (newspaper articles) it's year, month and day.

For simplicity, I'd like to use a single element to contain all three of these variants. Studying the spec (Appendix D.2 at http://www.w3.org/TR/xmlschema-2/), I see that if my element is of the date datatype, I can't just omit the day and/or month when I don't need them (the way ISO 8601 allows you to do), because those representations are used for different datatypes (gYearMonth and gYear, respectively).

But can I use zeroes for the unneeded values? Like this:

<pubdate>2009-04-00</pubdate>
<pubdate>2007-00-00</pubdate>

The spec explicitly prohibits "0000" as a year value (Appendix D.3) but doesn't say anything one way or the other about zeroes for month and day.

I suspect the answer to my question is no, because date values are supposed to correspond to intervals exactly one day long (spec section 3.2.9). But I still wanted to ask, both to make sure I don't needlessly discard a valid approach and because I haven't seen this exact question addressed elsewhere.

The closest thing I've found is this: http://www.biglist.com/lists/xsl-list/archives/200408/msg00297.html. One solution proposed there is to create an attribute for each part of the date, which I may end up doing if I can't use zeroes as I proposed above. Better ideas are welcome, of course.

2

There are 2 answers

1
Michael Kay On

You could define a union type with member types (xs:date, xs:gYearMonth, xs:gYear) and this would allow you to use values such as

<pubdate>2013-12-12</pubdate>
<pubdate>2009-04</pubdate>
<pubdate>2007</pubdate>
1
kjhughes On

No, 00 is not a legal value for month or day per xsd:date; the examples you listed

<pubdate>2009-04-00</pubdate>
<pubdate>2007-00-00</pubdate>

would not be valid.

Observation #1:

You mentioned using attributes instead. I assume by this that you mean something other than placing the entire date string in an attribute rather than an element, because the typing issue is the same for both. Either way, you could define a new type that allowed months and days to be omitted (which would be preferable to allowing 00).

Observation #2:

Dates can be very messy, especially if the source is uncontrolled legacy data. You may want to normalize to a strict format with optional components for month and day as much as possible, but also support an unconstrained text capture of the date as originally presented for cases where normalization is not possible due to incomplete or ambiguous data. Dates originating from unconstrained user input or OCR'ing can be particularly challenging to shoehorn into a standard format.