Let me start off by saying my python knowledge is beginner-to-intermediate level, and I recently started using the language again after a long time.
The Goal:
This morning I came across a bunch of word documents I wanted to convert and concatenate to PDF files, with 2 .doc files creating one PDF. seemed like a fairly trivial task, so I figured I'd try to learn how to do it in python. concatenating PDFs wasn't too bad, I found PyPDF2 and managed to write a script that did just that.
But 7 hours later, after countless scripts with broken dependencies- I still can't find a way to automate the doc-pdf conversion.
The Problem(s):
every script I found either:
- uses python-docx (my documents are word 2003 .docs)
- uses unoconv bridge (which I installed along with OpenOffice, then searched around for documentation but found none- thus I have no idea how to call from a python script or the shell. I saw one example for this but it keeps throwing errors)
- uses win32com or win32com.client or pywin32 or somesuch. I ran into numerous issues with these- installed one but couldn't import it from code (as happened to the guy here), now I can't even find them with pip. searched for documentation for them (are they modules or classes? I have no idea) and found practically nothing that I could understand, beyond that they're connected to ActivePython. (which is apparantly a superset of Python with more capabilities?).
- Uses comtypes, which I installed but was unable to use/import either for some reason (maybe I'm using pip wrong somehow?)
I know my question is hardly focused but honestly by now my brain is fried from information overload. any simplifications for a noob would be more than welcome.
TL;DR:
assuming no knowledge of COM stuff and little experience with any external frameworks:
- what would I have to do to convert Word 2003 .doc files to .pdf files? I'm running python3.5.1 32-bit on a Windows 10 64-bit machine.
- where can I learn more about accessing other software APIs from python? are there big prerequisites for this stuff like knowing how the OS works on a lower level?
Thanks!
From my experience, converting between the various office formats is best done outside of python. With the subprocess module, you can call the external command
where soffice is the command that comes with LibreOffice.