Read XForm from odt file with java

1.4k views Asked by At

I am trying to read data from a odt file (created with LibreOffice). The requirement is to get the xml that is binded to an XForm included in the document. I am currently using the odfdom-java library to read the file. So far I have managed to read the values of the form field by parsing the document with jdom, but what I actually want is to get the whole xml with the form data. Alternatively, I can load the file as

OdfTextDocument.loadDocument("C://myFile.odt");.

Does anyone know how I can get the XForm xml from there?

Alternatively, would it help if I converted the odt file to pdf programmatically? Using pdfbox I have managed to get the acroform

    PDDocument pdDoc = PDDocument.loadNonSeq( new File("C://myFile.odt"), null);
    PDDocumentCatalog pdCatalog = pdDoc.getDocumentCatalog();
    PDAcroForm pdAcroForm = pdCatalog.getAcroForm();

but face the same problem afterwards (how to get the xml with the form data).

1

There are 1 answers

0
dchar On

I have managed to do this via jdom (odfdom-java) was not used after all. The binded xml exists itself in the xml that represents the odt. All you need is to know the id of the form or the name of the tag, in order to get the proper node. Afterwards, a string is constructed that contains the xml with form data. My code is as follows:

import org.apache.xerces.dom.DeepNodeListImpl;
import org.apache.xml.serialize.OutputFormat;
import org.apache.xml.serialize.XMLSerializer;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.IOException;
import java.util.Enumeration;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;

public class TestXFormData {

    private static StringBuilder nodeContent;

    public static void main(String[] args) throws Exception {
        //Unzip the openOffice Document
        ZipFile zipFile = new ZipFile("C://myFile.odt");
        Enumeration entries = zipFile.entries();
        ZipEntry entry;

        while(entries.hasMoreElements()) {
            entry = (ZipEntry) entries.nextElement();
            if (entry.getName().equals("content.xml")) {
                // construct document
                DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
                domFactory.setNamespaceAware(true);
                DocumentBuilder docBuilder = domFactory.newDocumentBuilder();
                Document doc = docBuilder.parse(zipFile.getInputStream(entry));
                // print the document
                printDocument(doc);
                // get the node
                NodeList list = doc.getElementsByTagName("myTagName");
                Node node = ((DeepNodeListImpl) list).item(0);
                nodeContent = new StringBuilder();
                // print the xml with the form data
                prettyPrint(node);
                System.out.println(nodeContent.toString());
            }
        }
    }


    private static void prettyPrint(Node node) {
        if (node.getNodeType() == Node.TEXT_NODE) {
            nodeContent.append(node.getNodeValue());
        } else if (node.getNodeType() == Node.ELEMENT_NODE) {
            nodeContent.append("<" + node.getNodeName() + ">");
            NodeList kids = node.getChildNodes();
            for (int i = 0; i < kids.getLength(); i++) {
                prettyPrint(kids.item(i));
            }
            nodeContent.append("</" + node.getNodeName() + ">");
        }
    }


    private static void printDocument(Document doc) throws IOException {
         OutputFormat format = new OutputFormat(doc);
         format.setIndenting(true);
         XMLSerializer serializer = new XMLSerializer(System.out, format);
         serializer.serialize(doc);
    }
}