Python subprocess behaves different when called from Django vs unit test

225 views Asked by At

My first time posting - please go easy on me. I could not come up with a succinct title that summarizes this issue. I seem to have a codec problem.

My django-based website calls a subprocess (soffice) to convert uploaded documents to basic text files, to then go on to do some processing of the text from the doc. This was working beautifully for a time. On my local dev machine, the unit tests for file conversion still work perfect as does the complete django app, end-to-end. On the production server, where it all used to work, the file conversion call no longer works the same from within the django app, while it does work properly when run from the test code. This change in behavior appears to be the result of running general server updates.

args = ['soffice',
        '--headless',
        '--convert-to',
        'txt:Text',
        '--outdir',
        outDir,
        filePath]

subprocess.call(args)

fo = open(textFilePath, "r")

try:
    docText = fo.read()
except:
    print("Failed to read", textFilePath)
    docText = None

I removed some of the error checking to simplify a bit.

When I run the file conversion code as part of the complete django application on the production server, I can see that certain special characters such as symbol ยง are turned into garbage. But if I run the same file conversion code on its own, outside of django, on the same machine, those symbols are not corrupted. As mentioned, on my dev machine, it works both standalone and within django. The one difference between the two machines is how I run django. Locally, it's run using django's runserver command. On the production machine, it's using mod_wsgi with apache. I don't see how it's possible for django or mod_wsgi to interfere with what soffice is doing in the subprocess, but it does appear that way. I have opened a python shell on the problem server and run essentially the same code as above, getting clean text back, plus running the unit tests against it works too.

Any help is sincerely appreciated!

2

There are 2 answers

2
Dan On BEST ANSWER

The solution was to upgrade mod_wsgi using:

pip install mod_wsgi --upgrade
1
Graham Dumpleton On

If you are using mod_wsgi daemon mode, ensure you are setting lang/locale as otherwise you are going to inherit a default encoding of ASCII from the operating system.

This would propagate through to sub processes as well.

if not using daemon mode, you really should be looking at doing so as it is preferred over embedded mode of mod_wsgi. If using embedded mode it is somewhat harder to change the lang/locale as must be done in Apache startup scripts and how you do that depends on the platform and distro.