How to read an ODS document?

2.6k views Asked by At

My goal is to simply read an ODS file that was created with Libre Office - Calc.

Using org.apache.odftoolkit:odfdom-java:0.8.8-incubating I open the ODS file and convert the content to scala xml as follows:

val doc = OdfSpreadsheetDocument.loadDocument(classOf[OdfTest].getResourceAsStream("/test.ods"))
val dom: OdfContentDom = doc.getContentDom
val xml = XML.loadString(dom.toString)

This produces the following xml:

<office:document-content office:version="1.2"
                         xmlns:calcext="urn:org:documentfoundation:names:experimental:calc:xmlns:calcext:1.0"
                         xmlns:chart="urn:oasis:names:tc:opendocument:xmlns:chart:1.0"
                         xmlns:css3t="http://www.w3.org/TR/css3-text/" xmlns:dc="http://purl.org/dc/elements/1.1/"
                         xmlns:dom="http://www.w3.org/2001/xml-events"
                         xmlns:dr3d="urn:oasis:names:tc:opendocument:xmlns:dr3d:1.0"
                         xmlns:draw="urn:oasis:names:tc:opendocument:xmlns:drawing:1.0"
                         xmlns:drawooo="http://openoffice.org/2010/draw"
                         xmlns:field="urn:openoffice:names:experimental:ooo-ms-interop:xmlns:field:1.0"
                         xmlns:fo="urn:oasis:names:tc:opendocument:xmlns:xsl-fo-compatible:1.0"
                         xmlns:form="urn:oasis:names:tc:opendocument:xmlns:form:1.0"
                         xmlns:formx="urn:openoffice:names:experimental:ooxml-odf-interop:xmlns:form:1.0"
                         xmlns:grddl="http://www.w3.org/2003/g/data-view#"
                         xmlns:math="http://www.w3.org/1998/Math/MathML"
                         xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"
                         xmlns:number="urn:oasis:names:tc:opendocument:xmlns:datastyle:1.0"
                         xmlns:of="urn:oasis:names:tc:opendocument:xmlns:of:1.2"
                         xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
                         xmlns:ooo="http://openoffice.org/2004/office" xmlns:oooc="http://openoffice.org/2004/calc"
                         xmlns:ooow="http://openoffice.org/2004/writer"
                         xmlns:presentation="urn:oasis:names:tc:opendocument:xmlns:presentation:1.0"
                         xmlns:rpt="http://openoffice.org/2005/report"
                         xmlns:script="urn:oasis:names:tc:opendocument:xmlns:script:1.0"
                         xmlns:style="urn:oasis:names:tc:opendocument:xmlns:style:1.0"
                         xmlns:svg="urn:oasis:names:tc:opendocument:xmlns:svg-compatible:1.0"
                         xmlns:table="urn:oasis:names:tc:opendocument:xmlns:table:1.0"
                         xmlns:tableooo="http://openoffice.org/2009/table"
                         xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
                         xmlns:xforms="http://www.w3.org/2002/xforms" xmlns:xhtml="http://www.w3.org/1999/xhtml"
                         xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsd="http://www.w3.org/2001/XMLSchema"
                         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <office:scripts></office:scripts>
    <office:font-face-decls>
        <style:font-face style:font-family-generic="swiss" style:font-pitch="variable" style:name="Liberation Sans"
                         svg:font-family="'Liberation Sans'"></style:font-face>
        <style:font-face style:font-family-generic="system" style:font-pitch="variable" style:name="DejaVu Sans"
                         svg:font-family="'DejaVu Sans'"></style:font-face>
        <style:font-face style:font-family-generic="system" style:font-pitch="variable" style:name="Lohit Hindi"
                         svg:font-family="'Lohit Hindi'"></style:font-face>
    </office:font-face-decls>
    <office:automatic-styles>
        <style:style style:family="table-column" style:name="co1">
            <style:table-column-properties fo:break-before="auto"
                                           style:column-width="1.5866in"></style:table-column-properties>
        </style:style>
        <style:style style:family="table-column" style:name="co2">
            <style:table-column-properties fo:break-before="auto"
                                           style:column-width="1.222in"></style:table-column-properties>
        </style:style>
        <style:style style:family="table-column" style:name="co3">
            <style:table-column-properties fo:break-before="auto"
                                           style:column-width="0.889in"></style:table-column-properties>
        </style:style>
        <style:style style:family="table-row" style:name="ro1">
            <style:table-row-properties fo:break-before="auto" style:row-height="0.1681in"
                                        style:use-optimal-row-height="true"></style:table-row-properties>
        </style:style>
        <style:style style:family="table-row" style:name="ro2">
            <style:table-row-properties fo:break-before="auto" style:row-height="0.1756in"
                                        style:use-optimal-row-height="true"></style:table-row-properties>
        </style:style>
        <style:style style:family="table" style:master-page-name="Default" style:name="ta1">
            <style:table-properties style:writing-mode="lr-tb" table:display="true"></style:table-properties>
        </style:style>
        <style:style style:family="table-cell" style:name="ce1" style:parent-style-name="Default">
            <style:text-properties fo:font-weight="bold" style:font-weight-asian="bold"
                                   style:font-weight-complex="bold"></style:text-properties>
        </style:style>
    </office:automatic-styles>
    <office:body>
        <office:spreadsheet>
            <table:table table:name="Sheet1" table:style-name="ta1">
                <office:forms form:apply-design-mode="false" form:automatic-focus="false"></office:forms>
                <table:table-column table:default-cell-style-name="Default" table:style-name="co1"></table:table-column>
                <table:table-column table:default-cell-style-name="Default" table:style-name="co2"></table:table-column>
                <table:table-column table:default-cell-style-name="Default" table:style-name="co3"></table:table-column>
                <table:table-row table:style-name="ro1">
                    <table:table-cell office:value-type="string" table:style-name="ce1">
                        <text:p>A</text:p>
                    </table:table-cell>
                    <table:table-cell office:value-type="string" table:style-name="ce1">
                        <text:p>B</text:p>
                    </table:table-cell>
                    <table:table-cell office:value-type="string" table:style-name="ce1">
                        <text:p>C</text:p>
                    </table:table-cell>
                </table:table-row>
                <table:table-row table:style-name="ro2">
                    <table:table-cell office:value-type="string">
                        <text:p>1</text:p>
                    </table:table-cell>
                    <table:table-cell office:value-type="string">
                        <text:p>2</text:p>
                    </table:table-cell>
                    <table:table-cell office:value-type="string">
                        <text:p>3</text:p>
                    </table:table-cell>
                </table:table-row>
                <table:table-row table:style-name="ro1">
                    <table:table-cell office:value-type="string">
                        <text:p>11</text:p>
                    </table:table-cell>
                    <table:table-cell office:value-type="string">
                        <text:p>22</text:p>
                    </table:table-cell>
                    <table:table-cell office:value-type="string">
                        <text:p>33</text:p>
                    </table:table-cell>
                </table:table-row>
                <table:table-row table:number-rows-repeated="1048572" table:style-name="ro1">
                    <table:table-cell table:number-columns-repeated="3"></table:table-cell>
                </table:table-row>
                <table:table-row table:style-name="ro1">
                    <table:table-cell table:number-columns-repeated="3"></table:table-cell>
                </table:table-row>
            </table:table>
            <table:named-expressions></table:named-expressions>
        </office:spreadsheet>
    </office:body>
</office:document-content>

How can the XML be converted into a List (spreadsheet) of Lists (rows containing cell data) in Scala?

List(
  List("A", "B", "C"),
  List("1", "2", "3"),
  List("11", "22", "33"),
)
1

There are 1 answers

0
0__ On BEST ANSWER

You can use the backslash \ operator to go into the structure. Read the XPath bit of this Scala XML introduction.

val rowsXML = xml \ "body" \ "spreadsheet" \ "table" \ "table-row"
val rows    = rowsXML.map(r => (r \ "table-cell" \ "p").map(_.text))
val rowsNE  = rows.filterNot(_.forall(_.isEmpty))

The last line drops empty rows.