How to convert a scanned PDF to a DOCX

Question

How to convert a scanned PDF to a DOCX

188 views Asked by Musaib Ahmed Razzaqui At 26 October 2023 at 16:05

Hey I'm stuck in a problem. I want to convert a scanned PDF to a docx document WHILE preserving the format. How do I parse layout-parser in such a way that I preserve diagrams and table that are in the scanned PDF.

I tried converting through pytesseract image to hocr but it doesnt handle images. Also the text output is very annoying.

Original Q&A

There are 2 answers

**Oppa Oppa** · Answer 1 · 2023-10-26T16:12:08+00:00

Oppa Oppa On 26 October 2023 at 16:12

Create a free trial account for Adobe Acrobat. You have to open your PDF in Adobe Acrobat. Go to “File,” pick “Save As Other,” then choose “Microsoft Word” and “Word Document.” Then choose a name and where to save your Word document.

**K J** · Answer 2 · 2023-10-27T00:43:35+00:00

Word can import PDF scanned pages. Your biggest problem will be what method was used for any OCR, as it needs to be edited to suit the image, thus need manual styling. Like here I use red as a preference, for a scan of this page.

You will possibly need to look at commercial offerings from Abbyy, Acme, Adobe, Apryse, to z-I-got OCR PDF etc.

TechQA.

How to convert a scanned PDF to a DOCX

There are 2 answers

Related Questions in PYTHON-TESSERACT

Related Questions in PDFTOOLS

Popular Questions

Trending Questions