How to identify and validate an OOXML file?

3.9k views Asked by At

I need to be able to identify that a given file is an OOXML file based on the contents of the file, and not on the file's extension.

OOXML files are really a collection of XML and text files in a zip container, which means that I cannot use the file's magic number as it will just indicate that it is a zip file.

So what I'm really asking is are there any files that are required to be present in an OOXML Open Packaging Convention (OPC) container? If so the presence of that file in an OPC container indicates that it is likely to be an OOXML file, and the absence of that file indicates that it definitely is not an OOXML file.

This question is the OOXML version of this ODF question.

3

There are 3 answers

0
Todd Main On BEST ANSWER

Yes, there is a way. Go to OpenXMLDeveloper.org and download the PPTX that is "02: Open XML Packages" (Presentation 02). Then, on Slide 12 it tells you how to identify an Open XML document. It is document.xml, the rels files and [Content_Types].xml file (most importantly the the ContentType element). The important thing here is to use what's inside the file, not the file structure itself (Open Packaging Convention).

Another great resource is Open XML Markup Explained. Chapter 1 and then "Setting Up the Main Document" is a great place to find out about the structure of a Word docx. Excel and PowerPoint's structures are listed later on.

0
Ajit On

OOXMLValidator link is a relatively new tool that I have used to validate an OOXML file. It has helped me to identify an potential issues on a batch of OOXML files.

2
Amber On

A similar answer as that I gave to your ODF question - look at the technical specification of the format.