Unable to insert EMF into Word using Python

1.4k views Asked by At

I have a requirement for inserting SVG file into Word. Since, we cannot do this directly I am planning to convert SVG to EMF and insert it. Conversion from SVG to EMF works fine using the inkscape. However, I am unable to come up with right code for inserting it into Word. I followed the steps explained the the person Alvaro in this post. Have shown the steps followed in the attached file - enter image description here

This is my code -

enter image description here

However, when I run the code shown in the attachment - It still throws docx.image.exceptions.UnrecognizedImageError. The Contributor of the library on github claims that this library addresses this issue. If so then please let me know if I am missing anything.

I am able to insert the EMF file successfully manually. Attaching the doc by inserting the EMF. This EMF was downloaded from the internet for testing.

3

There are 3 answers

4
Yuri Khristich On BEST ANSWER

Here is another solution based on win32com module and MS Word API:

from pathlib import Path
import win32com.client

cur_dir  = Path.cwd()                                   # get current folder
pictures = list((cur_dir / "pictures").glob("*.emf"))   # get a list of pictures
word_app = win32com.client.Dispatch("Word.Application") # run Word
doc      = word_app.Documents.Add()                     # create a new docx file

for pict in pictures:                                   # insert all pictures
    doc.InlineShapes.AddPicture(pict)

doc.SaveAs(str(cur_dir / "pictures.docx"))              # save the docx file
doc.Close()                                             # close docx
word_app.Quit()                                         # close Word

Put your EMF images in subfolder pictures and run this script. After that you get in current folder the file pictures.docx that contains all these EMF images inside.

5
Yuri Khristich On

It seems the module docx doesn't work with EMF files.

The work around that I mean is here:

import shutil
import zipfile

temp_dir = "_temp"

old_docx = "doc.docx"
new_docx = "doc_new.docx"

old_emf = temp_dir + "/word/media/image1.emf"
new_emf = "new_image.emf"


# unpack content of the docx file into the temp folder

with zipfile.ZipFile(old_docx, "r") as z:
    files = z.namelist()
    for f in files: z.extract(f, temp_dir)


# replace the image

shutil.copyfile(new_emf, old_emf)


# pack all files from temp folder back into the new docx file

with zipfile.ZipFile(new_docx, "a") as z:
    for f in files: z.write(temp_dir + "/" + f, f)


# remove the temp folder

shutil.rmtree(temp_dir)

Typical structure of a docx file:

doc.docx
│
├─ [Content_Types].xml
│
├─ _rels
│  └─ .rels
│
├─ docProps
│  ├─ app.xml
│  └─ docProps
│
└─ word
   ├─ document.xml    <-- text is here
   ├─ fontTable.xml
   ├─ settings.xml
   ├─ webSettings.xml
   ├─ styles.xml
   │
   ├─ _rels
   │  └─ document.xml.rels
   │
   ├─ theme
   │  └─ theme1.xml
   │
   └─ media
      └─ image1.emf   <-- your image is here

It unpacks content of the doc file doc.docx in temporary folder _temp, then it replaces the file image1.emf inside the temp dir with another file new_image.emf from current dir. Then it packs the content of the temp folder back into doc_new.docx file and removes the temp dir.

Note: new image will have the same size in the new_doc.docx as old one.

So the workflow can be like this: you make template docx file, place there manually template emf picture and save the docx file. Then you take the new emf image, put the image next to the docx file and run the script. This way you get new a docx file with the new emf image.

I suppose you have many emf images, so it makes sense to add in this script a couple lines that it be able to take several images and make several docx files.

It will work fine if all the emf images have the same size. In case they have different size it will take more coding to handle with xml data.

Update

I've figured out how to get sizes of emf image. So here is full solution:

from docx import Document
import shutil
import zipfile

temp_dir = "_temp"
old_docx = "doc.docx"
new_docx = "doc_new.docx"
old_emf  = temp_dir + "/word/media/image1.emf" # don't change this line
new_emf  = "img5.emf"

# unpack content of the docx file into temp folder
with zipfile.ZipFile(old_docx, "r") as z:
    files = z.namelist()
    for f in files: z.extract(f, temp_dir)

# replace the image
shutil.copyfile(new_emf, old_emf)

# pack all files from temp folder back into the new docx file
with zipfile.ZipFile(new_docx, "a") as z:
    for f in files: z.write(temp_dir + "/" + f, f)

# remove temp folder
shutil.rmtree(temp_dir)

# get sizes of the emf image
with open(new_emf, "rb") as f:
    f.read(16)
    w1, w2 = f.read(1).hex(), f.read(1).hex()
    f.read(2)
    h1, h2 = f.read(1).hex(), f.read(1).hex()

width  = int(str(w2) + str(w1), 16) * 762
height = int(str(h2) + str(h1), 16) * 762

# open the new docx file and set the sizes for the image
doc = Document(new_docx)
img = doc.inline_shapes[0]  # suppose the first image is the image
img.width  = width
img.height = height

doc.save(new_docx)
3
kiwiwings On

SVG can be added to Word directly - just try it out manually in Word (2016). I've created an example Java project as a POC for your use case. No need to call inkscape, because the fallback PNG is created on the fly via Batik.

Of course the OP asked for a Python solution - but in case python-openxml is missing some functionality, there might be a point where there needs to be put more effort in getting it to run via python vs. invoking a java runtime.

Regarding the workaround solution via EMF - be aware that there are various methods in determining the bounds - in the EMF renderer, which I've implemented in POI, I scan through the Window and Viewport records by default and only use the EMF header bounds if I couldn't find anything else or if the scan is omitted via configuration option. This gives me usually better results.

The relevant code snipplet of the example project is the following:

public class AddSvgToDocument {
    public static void main(String[] args) throws IOException, InvalidFormatException {
        File tmplDocx = new File(args[0]);
        File svgFile = new File(args[1]);
        File outDocx = new File(args[2]);

        try (FileInputStream fis = new FileInputStream(tmplDocx);
             XWPFDocument doc = new XWPFDocument(fis)) {

            SVGImageRenderer rnd = new SVGImageRenderer();
            try (FileInputStream fis2 = new FileInputStream(svgFile)) {
                rnd.loadImage(fis2, PictureData.PictureType.SVG.contentType);
            }

            Rectangle2D nativeDim = rnd.getNativeBounds();
            double widthPx = 500;
            double heightPx = widthPx * nativeDim.getHeight() / nativeDim.getWidth();

            BufferedImage bi = rnd.getImage(new Dimension2DDouble(widthPx, heightPx));
            ByteArrayOutputStream bos = new ByteArrayOutputStream(100_000);
            ImageIO.write(bi, "PNG", bos);

            XWPFRun run = doc.createParagraph().createRun();

            int widthEmu = Units.pixelToEMU((int)widthPx);
            int heightEmu = Units.pixelToEMU((int)heightPx);
            XWPFPicture pic = run.addPicture(new ByteArrayInputStream(bos.toByteArray()), PictureData.PictureType.PNG.ooxmlId, "image.png", widthEmu, heightEmu);
            CTOfficeArtExtensionList extLst = pic.getCTPicture().getBlipFill().getBlip().addNewExtLst();
            addExt(extLst, "{28A0092B-C50C-407E-A947-70E740481C1C}"
                , "http://schemas.microsoft.com/office/drawing/2010/main", "a14:useLocalDpi"
                , "val", "0");

            addExt(extLst, "{96DAC541-7B7A-43D3-8B79-37D633B846F1}"
                , "http://schemas.microsoft.com/office/drawing/2016/SVG/main", "asvg:svgBlip"
                , "r:embed", addSVG(doc, svgFile));

            try (FileOutputStream fos = new FileOutputStream(outDocx)) {
                doc.write(fos);
            }
        }
    }



    private static void addExt(CTOfficeArtExtensionList extLst, String uri, String namespace, String name, String attribute, String value) {
        CTOfficeArtExtension ext = extLst.addNewExt();
        ext.setUri(uri);
        XmlCursor cur = ext.newCursor();
        cur.toEndToken();
        String[] prefixName = name.split(":");
        cur.beginElement(new QName(namespace, prefixName[1], prefixName[0]));
        cur.insertNamespace(prefixName[0], namespace);
        if (attribute.contains(":")) {
            prefixName = attribute.split(":");
            String prefix = prefixName[0];
            String attrNamespace = DEFAULT_XML_OPTIONS
                .getSaveSuggestedPrefixes().entrySet().stream()
                .filter(me -> prefix.equals(me.getValue()))
                .map(Map.Entry::getKey)
                .findFirst().orElse(null);
            cur.insertAttributeWithValue(new QName(attrNamespace, prefixName[1], prefix), value);
        } else {
            cur.insertAttributeWithValue(attribute, value);
        }
        cur.dispose();
    }

    private static String addSVG(XWPFDocument doc, File svgFile) throws InvalidFormatException, IOException {
        // SVG is not thoroughly supported as of POI 5.0.0, hence we need to go the long way instead of adding a picture
        OPCPackage pkg = doc.getPackage();
        String svgNameTmpl = "/word/media/image#.svg";
        int svgImageIdx = pkg.getUnusedPartIndex(svgNameTmpl);
        PackagePartName svgPPName = PackagingURIHelper.createPartName(svgNameTmpl.replace("#", Integer.toString(svgImageIdx)));
        PackagePart svgPart = pkg.createPart(svgPPName, PictureData.PictureType.SVG.contentType);

        try (FileInputStream fis = new FileInputStream(svgFile);
             OutputStream os = svgPart.getOutputStream()) {
            IOUtils.copy(fis, os);
        }
        PackageRelationship svgRel = doc.getPackagePart().addRelationship(svgPPName, TargetMode.INTERNAL, IMAGE_PART);
        return svgRel.getId();
    }
}