Normally I process files in Python using a with statement, as in this chunk for downloading a resource via HTTP:
with (open(filename), "wb"):
for chunk in request.iter_content(chunk_size=1024):
if chunk:
file.write(chunk)
file.flush()
But this assumes I know the filename. Suppose I want to use tempfile.mkstemp()
. This function returns a handle to an open file and a pathname, so using open
in a with
statement would be wrong.
I've searched around a bit and found lots of warnings about being careful to use mkstemp
properly. Several blog articles nearly shout when they say do NOT throw away the integer returned by mkstemp
. There are discussions about the os-level filehandle being different from a Python-level file object. That's fine, but I haven't been able to find the simplest coding pattern that would ensure that
mkstemp
is called to get a file to be written to- after writing, the Python file and its underlying os filehandle are both closed cleanly even in the event of an exception. This is precisely the kind of behavior we can get with an
with(open...
pattern.
So my question is, is there a nice way in Python to create and write to a mkstemp
generated file, perhaps using a different kind of with statemement, or do I have to manually do things like fdopen
or close
, etc. It seems there should be a clear pattern for this.
The simplest coding pattern for this is
try:
/finally:
:However, if you're doing this more than once, it's trivial to wrap it up in a context manager:
And then you can just do:
If you really want to, of course, you can always wrap the fd up in a file object (by passing it to
open
, oros.fdopen
in older versions). But… why go to the extra trouble? If you want an fd, use it as an fd.And if you don't want an fd, unless you have a good reason that you need
mkstemp
instead of the simpler and higher-levelNamedTemporaryFile
, you shouldn't be using the low-level API. Just do this:Besides being simpler to
with
, this also has the advantage that it's already a Python file object instead of just an OS file descriptor (and, in Python 3.x, it can be a Unicode text file).An even simpler solution is to avoid the tempfile completely.
Almost all XML parsers have a way to parse a string instead of a file. With
cElementTree
, it's just a matter of callingfromstring
instead ofparse
. So, instead of this:… just do this:
Of course the first version only needs to hold the XML document and the parsed tree in memory one after the other, while the second needs to hold them both at once, so this may increase your peak memory usage by about 30%. But this is rarely a problem.
If it is a problem, many XML libraries have a way to feed in data as it arrives, and many downloading libraries have a way to stream data bit by bit—and, as you might imagine, this is again true for cElementTree's
XMLParser
and forrequests
in a few different ways. For example:Not quite as simple as just using
fromstring
… but it's still simpler than using a temporary file, and probably more efficient to boot.If that use of the two-argument form of
iter
confuses you (a lot of people seem to have trouble grasping it at first), you can rewrite it as: