I have converted a PDF document to word document using ABBYY finereader. The table present in the word document is not recognized by XWPFTable(Apache POI).
Below is the table format:
Heading1 Heading2 Heading3 Heading4
Sub-heading1 Sub-heading2
2011 36.66 ABC 24,000 C
2012 46.90 ABC 78,000 C
ABC 90,000 D
Below is my piece of code:
import java.io.FileInputStream;
import java.util.Iterator;
import java.util.List;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.usermodel.IBodyElement;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.apache.poi.xwpf.usermodel.XWPFTable;
public class TableExtraction {
public static void main(String[] args) {
try {
FileInputStream fis = new FileInputStream("<path to docx file>");
XWPFDocument xdoc=new XWPFDocument(OPCPackage.open(fis));
Iterator<IBodyElement> bodyElementIterator = xdoc.getBodyElementsIterator();
while(bodyElementIterator.hasNext()) {
IBodyElement element = bodyElementIterator.next();
if("TABLE".equalsIgnoreCase(element.getElementType().name())) {
System.out.println("Table Data");
List<XWPFTable> tableList = element.getBody().getTables();
for (XWPFTable table: tableList) {
System.out.println("Total Number of Rows of Table:" + table.getNumberOfRows());
System.out.println(table.getText());
}
}
else {
System.out.println("Not a Table Data");
}
}
xdoc.close();
}
catch(Exception ex) {
ex.printStackTrace();
}
}
}
Output:
Not a Table Data
I tried it with your code on a Word table of mine, and it didn't work. Assuming that it is a regular Word table, you can iterate over the tables directly like this:
If this doesn't work, either, something has probably gone wrong in the conversion from PDF.