I'd like to recognize the text of all pdfs on my computer and save them without moving them from their locations. Is it possible?

Question

I'd like to recognize the text of all pdfs on my computer and save them without moving them from their locations. Is it possible?

74 views Asked by Harrison Alley At 12 September 2017 at 14:58

I've tried using Adobe Acrobat X Pro to "recognize text in multiple files."

When I start this process and it asks for the directory, I've chose C:, my main hard drive.

It took hours to load and when it did, the list of files it generated included word documents as well. Adobe said I couldn't proceed until I removed the problem files.

Once I removed all the pdfs Adobe flagged as having errors (like password protection) and the prompt remained, I assumed it meant the word documents in the list.

So I manually removed those too. But Adobe still said that I couldn't proceed until problem files were removed and there weren't any remaining files in the list that adobe had flagged as having issues.

My firm is trying to make sure all pdfs we have are searcheable. Currently, some are and some aren't. Our goal is to make them all searchable without removing them from their varied locations.

Original Q&A

There are 1 answers

**Joris Schellekens** · Accepted Answer · 2017-09-13T14:41:25+00:00

I think you can do this using a combination of

regular java : to list all files in a directory that match a given criterium (e.g. their name ends with '.pdf')
iText : to iterate over the PDF document and extract all images
Tess4J : a port of Tesseract (google OCR engine) for java, to turn the extracted images back into text

Unless I am much mistaken, Tesseract even offers a crude version of this workflow for you. But only for 1 pdf at a time. So you'd still need some windows/linux scripting to pipe in all files of a given directory.

TechQA.

I'd like to recognize the text of all pdfs on my computer and save them without moving them from their locations. Is it possible?

There are 1 answers

Related Questions in PDF

Related Questions in OCR

Related Questions in TEXT-RECOGNITION

Popular Questions

Popular Tags

Trending Questions