How to convert pdf to xml using pdfbox or any other library?

1.2k views Asked by At

I want to convert pdf files into xml. Is there any java library available that can be used for this?

1

There are 1 answers

0
Anil Agrawal On

You can fetch xml representation of any PDF document as below using Apache Tika library

InputStream stream = new FileInputStream("sample.pdf");
ContentHandler handler = new ToXMLContentHandler();
Metadata metadata = new Metadata();
AutoDetectParser parser = new AutoDetectParser();
System.out.println(parser.parse(stream, handler, metadata));