Extract text form pdf using Foxit SDK

2k views Asked by At

I am using Foxit SDK to extract the text from Pdf document .

Everything is okay but when I extract a pdf in other languages rather than English I don't get the correct output .

I have also used PDFBox in java but that gives me the worst output, output from Foxit SDK is better than PDFBox.

Are there ant other libraries which can solve the issue..? Or there is some other solution.

3

There are 3 answers

0
Andrew Cash On

You might want to try the trial version of Quick PDF Library to see how it performs on your documents. http://www.quickpdflibrary.com

QP.GetPageText(7) or GetPageText(8) returns pretty good results for most PDF files.

Andrew.

Disclaimer: I do some consulting work for Quick PDF Library.

1
MyKuLLSKI On

Personally if you want it done right you have to pay for it. ComponentOne has a PDFViewer for WPF. Not sure what framework your working with since your tag is missing one.

ComponentOne PDF Viewer for WPF

0
Moody Ibrahim Moody On

If you are on windows, you can use the IFilter that adobe provides. Me, I used the IFilter adobe provides with the adobe reader 8. Here is a link to the exact example I used

http://www.codeproject.com/Articles/13391/Using-IFilter-in-C

The performance was okay (I think. I haven't used many other methods). Takes about 15 sec for a 400 page PDF.