Paperless-ngx redo OCR for documents

Question

Paperless-ngx redo OCR for documents

732 views Asked by KaO At 20 March 2023 at 16:51

I'm trying to redo the OCR for my documents on Paperless-ngx, because some obvious text on the PDF's are missing or not indexed automatically. What should I do to redo OCR for specific documents ?

I'm using the docker installation so I have the following containers running:

paperless-webserver-1
paperless-broker-1
paperless-db-1
paperless-gotenberg-1
paperless-tika-1

I have found the following discussing on the GitHub page but it doesn't tell how to actually do it, just "implemented".

There are also mentions of PAPERLESS_OCR_MODE=<mode> in their documentation. However again, no example and I couldn't find where to apply the setting.

Thank you :)

Original Q&A

There are 1 answers

**Ogreucha** · Accepted Answer · 2023-09-21T09:47:56+00:00

Ogreucha On 21 September 2023 at 09:47 BEST ANSWER

You can trigger a force OCR by running this command:

docker exec -d  -e "PAPERLESS_OCR_MODE=force" paperless-webserver-1 document_archiver --overwrite --document [HERE_COMES_THE_DOCUMENT_ID]

TechQA.

Paperless-ngx redo OCR for documents

There are 1 answers

Related Questions in OCR

Related Questions in PAPERLESS

Popular Questions

Trending Questions