Ms Word 2003 and 2007 xml differences

2.4k views Asked by At

Can someone explain differences in xml structures for Ms Word 2003 and 2007? Cheers! :)

1

There are 1 answers

1
AudioBubble On BEST ANSWER

This is a very broad question so it is difficult to know what kind of explanation you are looking for, but the main points are arguably:

a. Word 2003 XML files are true uncompressed XML format files that saves to a single, uncompressed text file. Word 2007 can save to that format, but its native format is a bundle of XML and other files (for example, image files) stored in a .zip file. There are separate XML files for the main document body, headers/footers, footnotes, styles, document properties, and so on. However, Word 2007 can also save to its own single, uncompressed XML file format often referred to as "Flat OPC."

b. The primary namespace URI used in the Word 2003 format is http://schemas.microsoft.com/office/word/2003/wordml

The primary namespace URI used in the Word 2007 format for the main document is http://schemas.openxmlformats.org/wordprocessingml/2006/main

Several other namespace URIs are used, but in the case of Word 2007, there is if you like a different primary namespace for each type of xml document contained in the ZIP.

c. The Word 2003 format was not AFAIK standardised outside Microsoft. The Word 2007 format was the basis for two standards, ECMA-376, and ISO 29500. AFAICR the 2007 conforms to ECMA-376, or nearly conforms to it. Only the .zip compressed format is stadardises. The Flat OPC format is not, and AFAIK the additional XML namespace that it uses has never been officially documented (not that it is difficult to understand).

Because the OOXML format splits a Word document into multiple separate XML documents and other components (often known as "parts"), the format also makes use of "Relationship" files, which contain XML that defines the relationships between one part and another. For example, if the main document contains a picture, the XML for the main document may need to reference an image file part, but will typically do so by using a relationship ID rather than naming the part directly.