Docx4J: file from LibreOffice has artifacts on windows

75 views Asked by At

I'm trying to convert .docx file to .pdf. For it i used Docx4J library and template generated in LibreOffice (Ubuntu).

WordprocessingMLPackage wordprocessingMLPackage = WordprocessingMLPackage.load(docxReport);
Docx4J.toPDF(wordprocessingMLPackage, reportStream);

All work success on Ubuntu. I received .pdf file with correct conent.

But if run this application with template from LibreOffice on Windows OS i see artifacts like:

######!
####.

The same effect with file which was generated by MS Word. If run application on Linux machine with template from Windows it will contain

######!

Here attached template from LibreOffice - https://dropmefiles.com/oH88X

Do you know what need to set up for correct convert from .docx to .pdf independently from OS and LibreOffice or MS Word?

1

There are 1 answers

0
JasonPlutext On

As commenters have suggested, a "#" indicates that your docx is using a font not present on the system.

To help with this, docx4j has a concept of a font Mapper.

        // Set up font mapper (optional)
//      Mapper fontMapper = new IdentityPlusMapper();  // Only for Windows, unless you have Microsoft's fonts installed
        Mapper fontMapper = new BestMatchingMapper();  // Good for Linux (and OSX?)
        wordMLPackage.setFontMapper(fontMapper);

IdentityPlusMapper is suitable for use on Windows and for documents which use Microsoft's standard fonts; it assumes the font names in the docx match a font installed on the system.

BestMatchingMapper uses the Panose system to choose a font which is "close" to the one specified.

Your choice of font Mapper just sets up the basic strategy.

You can then specify a mapping for a particular font name to a font actually present.

PhysicalFont font 
        = PhysicalFonts.get("Arial Unicode MS"); // just an example
if (font!=null) {
    fontMapper.put("HelveticaNeue", font);
}

Of course, you could choose to install the relevant fonts on the system. In this case, docx4j can discover them.

Alternatively, you might embed the font in the docx.