django fcgi seems to change the behavior of the python library

53 views Asked by At

I'm trying to walk a directory tree and return true if I find any of a certain type of file:

for dirpath, dirnames, filenames in os.walk(location):
  for f in filenames:
    if fn.endswith(".eml") or fn.endswith(".zip"):
      return True

This is always working inside a mount point from a remote system. NFS mounts have never shown a problem.

We recently had some one do a CIFS mount where one of the file names contains a \xc2\xb9 character (superscript one). In this case, we got a traceback:

  for dirpath, dirnames, filenames in os.walk(location):
File "/usr/lib64/python2.6/os.py", line 294, in walk
  for x in walk(path, topdown, onerror, followlinks):
File "/usr/lib64/python2.6/os.py", line 284, in walk
  if isdir(join(top, name)):
File "/usr/lib64/python2.6/posixpath.py", line 70, in join
  path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 23: ordinal not in range(128)

Now, here's the kicker: this only happens when the code is executing within FCGI. I can run the same code on the same tree as a standalone program, and there is no traceback. Any suggestions, apart from "Don't use os.walk()"?

Disclaimer: We're using an old version of Django. I can't change that.

1

There are 1 answers

0
tsuraan On BEST ANSWER

The issue appears to be that os.walk is being given a unicode object, so the path += ... operation is trying to convert the strings from a listdir call into unicodes before appending them to the path. The django vs. console difference is probably because parameters coming from Django (query params, url parts, etc) are unicode, while strings passed as arguments from the CLI are actual string objects.

A solution to this would be passing location.encode('utf-8') to os.walk, which should stop python from trying to convert the directory contents into unicode objects.