Prevent Word 2010 from saving o:gfxdata base64 or uuencoded VML?

4.3k views Asked by At

I am working with .docx files containing several drawing canvases with images inserted and some lines and arrows drawn in Word 2010. I am using 2010 format with no compatibility mode.

Word inserts an o:gfxdata attribute into each v:shape and v:group element and fills it with ascii encoded something. From what I have read it may be a copy of the VML describing the v:shape or v:group. I don't know if I just don't know what to look for, but I cannot determine what this data is for as its removal has no apparent effect on my ability to read or edit the document in Word 2003, 2007, or 2010.

It does swell the document.xml to almost twice the (apparent) necessary size. This considerably slows OpenTBS' processing so I would like to remove it, if possible. Does anyone know of a way to tell Word 2010 to quit saving this extra data? Or what it is for? I have really struggled to find any documentation on it beyond this post.

Edit:

Here is a sample .docx. The document.xml is ~141KB and OpenTBS takes an average of 10.35 seconds to create a file that includes this as a subtemplate 21 times. If I remove all of the o:ogfxdata attributes, the file size is reduced to ~37KB and OpenTBS takes only 2.99 seconds to produce the same file.

Edit 2:

After further investigation, it appears the removal of the o:gfxdata may cause Word 2003 with an older Compatibilty Pack installed, to object to the file with the following error:

"This is a pre-release version of the Compatibility Pack and can open pre-release Office 2007 files only. Do you want to check for a newer version of the Compatibility Pack?"

I have been able to open the file by installing a newer compatibility pack - though it prompts the user about the incompatibility and converts the file in order to open it. This does not damage my file, but it is something to look out for.

1

There are 1 answers

1
Skrol29 On BEST ANSWER

Attribute o:ogfxdata is poorly documented in the web. According to your investigations, it's some kind of compatibility extra information.

You can delete those attributes in your template using OpenTBS. The cleaning can be done once on your template without any merging, and then save the cleaned template as a new template. Or you can perform the cleaning each time you open the template.

Cleaning the DOCX file:

while ($x = clsTbsXmlLoc::FindStartTagHavingAtt($TBS->Source, 'o:gfxdata', 0) ) {
  $x->ReplaceAtt('o:gfxdata', '');
  $TBS->Source = str_replace(' o:gfxdata=""', '', $TBS->Source);
}

Note that the class clsTbsXmlLoc is provided with OpenTBS and is undocumented. The code should work since OpenTBS 1.8.0. (which is currently in stable beta version).

I've noticed that since attributes o:gfxdata are deleted, they do not come back immediately when you edit the docx.