Searching through PDF text with Node.js

Question

Searching through PDF text with Node.js

5.9k views Asked by markkazanski At 14 August 2018 at 18:59

I have thousands of searchable PDFs, some of which are up to a 1GB with over 2000 pages. I need to be able to search for a text string in these files using a Node.js app.

Right now, files are stored in a Google Cloud Storage bucket.

What's the best way to do this?

Some options:

Read the text from PDF files into MySQL using something like NPM package pdf-text-extract. Then use MySQL queries to search for text strings.
Search the PDF files directly using some NPM package.

Am I completely off? Is there a better way?

Original Q&A

There are 1 answers

**Konstantin Rybakov** · Answer 1 · 2018-08-14T19:34:45+00:00

Konstantin Rybakov On 14 August 2018 at 19:34

There are dedicated text search libraries out there, like this one, or this. Most likely you'd need to extract plain text from each pdf, save and index them. Then you'll be able to run search queries. Setting up database for this particular task may be an overkill.

TechQA.

Searching through PDF text with Node.js

There are 1 answers

Related Questions in MYSQL

Related Questions in NODE.JS

Related Questions in GOOGLE-APP-ENGINE

Related Questions in PDF

Related Questions in PDFTEXTSTREAM

Popular Questions

Popular Tags

Trending Questions