Can't read .docx file which i got after converting pdf using soffice command

204 views Asked by At

I am trying to convert pdf to docx using soffice. It converts it into .docx but it gives textboxes which I am unable to read using the docx api provided by python. Is there any better way to read the file or any better way to convert pdf to docx so that I do not get textboxes?

soffice --infilter="writer_pdf_import" --convert-to docx "convert_this.pdf"
1

There are 1 answers

0
Alexey Noskov On

You can try using Aspose.Words for Cloud to convert PDF to Word documents. https://docs.aspose.cloud/display/wordscloud/Convert+PDF+Document+to+Word It converts PDF from fixed form to flow form so it is editable in MS Word.

Disclosure: I work at Aspose.Words team.