java - How to get data of Textbox in docx file by docx4j

1.2k views Asked by At

I have a project that need to read all content of a docx file, but i dont know how to get it. All of thing i can get is just list of paragraphs. I wanna get data inside Textbox too Here is my code:

List<Object> texts = getAllElementFromObject(document.getMainDocumentPart(), P.class);

I tried to use method getAllElementFromObject(document.getMainDocumentPart(), CTTextbox.class);

but still cant get Textbox data.

My method getAllElementFromObject():

    public static List<Object> getAllElementFromObject(Object obj, Class<?> toSearch) {
    List<Object> result = new ArrayList<Object>();
    if (obj instanceof JAXBElement) obj = ((JAXBElement<?>) obj).getValue();

    if (obj.getClass().equals(toSearch))
        result.add(obj);
    else if (obj instanceof ContentAccessor) {
        List<?> children = ((ContentAccessor) obj).getContent();
        for (Object child : children) {
            result.addAll(getAllElementFromObject(child, toSearch));
        }
    }
    return result;
}
1

There are 1 answers

0
JasonPlutext On

A text box create in Word looks something like:

    <w:p >
        <w:r >
            <w:pict>
                <v:shapetype o:spt="202.0" path="m,l,21600r21600,l21600,xe" coordsize="21600,21600" id="_x0000_t202">
                    <v:stroke joinstyle="miter"/>
                    <v:path gradientshapeok="t" o:connecttype="rect"/>
                </v:shapetype>
                <v:shape o:gfxdata="UEsDB..8EAABkcnMvZG93bnJldzAAAAhwUAAAAA" type="#_x0000_t202" style="position:absolute;margin-left:0;margin-top:0;width:186.95pt;height:110.55pt;z-index:251659264;visibility:visible;mso-wrap-style:square;mso-width-percent:400;mso-height-percent:200;mso-wrap-distance-left:9pt;mso-wrap-distance-top:0;mso-wrap-distance-right:9pt;mso-wrap-distance-bottom:0;mso-position-horizontal:center;mso-position-horizontal-relative:text;mso-position-vertical:absolute;mso-position-vertical-relative:text;mso-width-percent:400;mso-height-percent:200;mso-width-relative:margin;mso-height-relative:margin;v-text-anchor:top" id="Text Box 2" o:spid="_x0000_s1026">
                    <v:textbox style="mso-fit-shape-to-text:t">
                        <w:txbxContent>
<w:p >
    <w:r>
        <w:t>foo</w:t>
    </w:r>
</w:p>
                            </w:txbxContent>
                        </v:textbox>
                    </v:shape>
                </w:pict>
            </w:r>
        </w:p>

Here the relevant objects are:

  • org.docx4j.vml.CTTextbox
  • org.docx4j.wml.CTTxbxContent (which might contain a content control)

Your code isn't going to work since Pict doesn't implement ContentAccessor.

So instead, please try https://github.com/plutext/docx4j/blob/master/src/main/java/org/docx4j/finders/ClassFinder.java