How to convert PDF to DOCX on linux

30.5k views Asked by At

I try to convert pdf file to word, excel and powerpoint. I already tried a lot of command like these:

soffice -env:UserInstallation=file:///$HOME/.libreoffice-headless/ --convert-to docx:"Microsoft Word 2007/2010/2013 XML" file.pdf
/usr/bin/soffice --headless --invisible --convert-to docx file.pdf
soffice --infilter="writer_pdf_import" --convert-to doc file.pdf

/usr/bin/libreoffice --headless --invisible --convert-to doc file.pdf
/usr/bin/soffice --headless --convert-to docx:"Microsoft Word 2007/2010/2013 XML" file.pdf

abiword --to=doc file.pdf
unoconv -f doc file.pdf
lowriter --invisible --convert-to doc 'file.pdf'

Always got this error message from soffice/libreoffice/unoconv:

:1: parser error : Document is empty
%PDF-1.7

And this one for abiword

Unable to init server: Could not connect: Connection refused

** (abiword:6477): WARNING **: clutter failed 0, get a life.
Unable to init server: Could not connect: Connection refused

With every command but abiword. I got a doc file with bad character inside. But never get a proper file.

I try to create a file converter so I only want command line method. Don't want to use someone API.

Thank you

2

There are 2 answers

5
Splinteer On

Managed to do it with soffice. I had to install this package: libreoffice-pdfimport And don't forget to use --infilter="writer_pdf_import"

0
rob grune On

Linux has a few apps that can import a pdf as an image: LibreOffice, Okular, Calibre.

But if you want editable text, then you need to install the pdf toolkit pdftk, then run the conversion utility pdf2txt. The terminal command is:

pdf2txt input.pdf output.txt

Thereafter, import the txt file into a wordpro, and complete the final editing/formatting.