I'm trying to walk a directory tree and return true if I find any of a certain type of file:
for dirpath, dirnames, filenames in os.walk(location):
for f in filenames:
if fn.endswith(".eml") or fn.endswith(".zip"):
return True
This is always working inside a mount point from a remote system. NFS mounts have never shown a problem.
We recently had some one do a CIFS mount where one of the file names contains a \xc2\xb9 character (superscript one). In this case, we got a traceback:
for dirpath, dirnames, filenames in os.walk(location):
File "/usr/lib64/python2.6/os.py", line 294, in walk
for x in walk(path, topdown, onerror, followlinks):
File "/usr/lib64/python2.6/os.py", line 284, in walk
if isdir(join(top, name)):
File "/usr/lib64/python2.6/posixpath.py", line 70, in join
path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 23: ordinal not in range(128)
Now, here's the kicker: this only happens when the code is executing within FCGI. I can run the same code on the same tree as a standalone program, and there is no traceback. Any suggestions, apart from "Don't use os.walk()"?
Disclaimer: We're using an old version of Django. I can't change that.
The issue appears to be that
os.walk
is being given a unicode object, so thepath += ...
operation is trying to convert the strings from alistdir
call into unicodes before appending them to the path. The django vs. console difference is probably because parameters coming from Django (query params, url parts, etc) are unicode, while strings passed as arguments from the CLI are actual string objects.A solution to this would be passing
location.encode('utf-8')
toos.walk
, which should stop python from trying to convert the directory contents into unicode objects.