XWPFTable not recognising table in a word document

1.1k views Asked by At

I have converted a PDF document to word document using ABBYY finereader. The table present in the word document is not recognized by XWPFTable(Apache POI).

Below is the table format:

Heading1        Heading2       Heading3  Heading4
Sub-heading1    Sub-heading2         
2011            36.66          ABC       24,000 C
2012            46.90          ABC       78,000 C
                ​               ABC       90,000 D

Below is my piece of code:

import java.io.FileInputStream;
import java.util.Iterator;
import java.util.List;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.usermodel.IBodyElement;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFTable;

public class TableExtraction {
  public static void main(String[] args) {
    try {
      FileInputStream fis = new FileInputStream("<path to docx file>");
      XWPFDocument xdoc=new XWPFDocument(OPCPackage.open(fis));
      Iterator<IBodyElement> bodyElementIterator = xdoc.getBodyElementsIterator();
      while(bodyElementIterator.hasNext()) {
        IBodyElement element = bodyElementIterator.next();
        if("TABLE".equalsIgnoreCase(element.getElementType().name())) {
          System.out.println("Table Data");
          List<XWPFTable> tableList =  element.getBody().getTables();
          for (XWPFTable table: tableList) {
            System.out.println("Total Number of Rows of Table:" + table.getNumberOfRows());
            System.out.println(table.getText());
          }
        }
        else {
          System.out.println("Not a Table Data"); 
        }
      }
      xdoc.close();
    }
    catch(Exception ex) {
      ex.printStackTrace();
    } 
  }
}  

Output:

Not a Table Data

1

There are 1 answers

0
JensS On

I tried it with your code on a Word table of mine, and it didn't work. Assuming that it is a regular Word table, you can iterate over the tables directly like this:

public static void main(String[] args) throws IOException {
    FileInputStream fis = new FileInputStream(FILE_NAME);
    XWPFDocument xdoc = new XWPFDocument(fis);

    for (XWPFTable table : xdoc.getTables()) {
         System.out.println(table.getRows().size());

          //in case you want to do more with the table cells...
         for (XWPFTableRow row : table.getRows()) {
            for (XWPFTableCell cell : row.getTableCells()) {
                for (XWPFParagraph para : cell.getParagraphs()) {
                    System.out.println(para.getText());
                }
            }
        }
    }
    fis.close();
    xdoc.close();
}

If this doesn't work, either, something has probably gone wrong in the conversion from PDF.