I am trying to convert .doc file to PDF, For this I am initially trying to convert .doc > XSL-FO > PDF.
On Converting the .doc to XSL-FO I am unable to convert the drawn objects such as checkbox,rectangle,square to XSL-FO.
It gets converted as below , which should actually be a box
The conversion code I am using is
HWPFDocumentCore wordDocument = WordToFoUtils.loadDoc(is);
WordToFoConverter wordToFoConverter = new WordToFoConverter(
XMLHelper.getDocumentBuilderFactory().newDocumentBuilder().newDocument());
wordToFoConverter.processDocument(wordDocument);
File foFile = new File("D:\\Testing\\testing\\" + "test.fo");
ByteArrayOutputStream out = new ByteArrayOutputStream();
StreamResult streamResult = new StreamResult(out);
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.transform(new
DOMSource(wordToFoConverter.getDocument()), streamResult);
String result =
org.apache.commons.lang3.StringUtils.normalizeSpace(java.text.Normalizer.normalize(new
String(out.toByteArray(), "UTF-8"), java.text.Normalizer.Form.NFD));
result = URLEncoder.encode(result, "UTF-8");
Further Apache FOP is used to convert the .fo to pdf
The .doc file is as below
and the WordToFoConverter converted the boxes as below
In Plain Text like XML, check boxes usually come from basic symbol fonts.
They are seen / shown as ☐ when unchecked, or ☑ or ☒ when checked.
In any basic text stream it should be relatively easy to use or find and replace them. However beware the encoding especially UTF , thus best copied from a clean set of Zapf Dingbats or Adobe TTF Symbol font.
many have a Unicode description but do test visually that they work after copy and paste from the PDF since the font mapping may not always tally.
8999 ⌧ ⌧ \002327 0x2327 X in a rectangle box
By far the simplest way to use UniCode text is as Rich Text which you can on Windows Command Line (you don't need the lower left dialogue, its just to illustrate export settings) outPort as Port-AbleDocFile using Write.exe which can read TXT and /PrintTo PDF.
Its much simpler than XML where just one character requires:-