Hey I'm stuck in a problem. I want to convert a scanned PDF to a docx document WHILE preserving the format. How do I parse layout-parser in such a way that I preserve diagrams and table that are in the scanned PDF.
I tried converting through pytesseract image to hocr but it doesnt handle images. Also the text output is very annoying.
Create a free trial account for Adobe Acrobat. You have to open your PDF in Adobe Acrobat. Go to “File,” pick “Save As Other,” then choose “Microsoft Word” and “Word Document.” Then choose a name and where to save your Word document.