Is there a way to automatically close a Python temporary file returned by mkstemp()

6.1k views Asked by At

Normally I process files in Python using a with statement, as in this chunk for downloading a resource via HTTP:

with (open(filename), "wb"):
    for chunk in request.iter_content(chunk_size=1024):
        if chunk:
            file.write(chunk)
            file.flush()

But this assumes I know the filename. Suppose I want to use tempfile.mkstemp(). This function returns a handle to an open file and a pathname, so using open in a with statement would be wrong.

I've searched around a bit and found lots of warnings about being careful to use mkstemp properly. Several blog articles nearly shout when they say do NOT throw away the integer returned by mkstemp. There are discussions about the os-level filehandle being different from a Python-level file object. That's fine, but I haven't been able to find the simplest coding pattern that would ensure that

  • mkstemp is called to get a file to be written to
  • after writing, the Python file and its underlying os filehandle are both closed cleanly even in the event of an exception. This is precisely the kind of behavior we can get with an with(open... pattern.

So my question is, is there a nice way in Python to create and write to a mkstemp generated file, perhaps using a different kind of with statemement, or do I have to manually do things like fdopen or close, etc. It seems there should be a clear pattern for this.

1

There are 1 answers

5
abarnert On BEST ANSWER

The simplest coding pattern for this is try:/finally::

fd, pathname = tempfile.mkstemp()
try:
    dostuff(fd)
finally:
    os.close(fd)

However, if you're doing this more than once, it's trivial to wrap it up in a context manager:

@contextlib.contextmanager
def mkstemping(*args):
    fd, pathname = tempfile.mkstemp(*args)
    try:
        yield fd
    finally:
        os.close(fd)

And then you can just do:

with mkstemping() as fd:
    dostuff(fd)

If you really want to, of course, you can always wrap the fd up in a file object (by passing it to open, or os.fdopen in older versions). But… why go to the extra trouble? If you want an fd, use it as an fd.

And if you don't want an fd, unless you have a good reason that you need mkstemp instead of the simpler and higher-level NamedTemporaryFile, you shouldn't be using the low-level API. Just do this:

with tempfile.NamedTemporaryFile(delete=False) as f:
    dostuff(f)

Besides being simpler to with, this also has the advantage that it's already a Python file object instead of just an OS file descriptor (and, in Python 3.x, it can be a Unicode text file).


An even simpler solution is to avoid the tempfile completely.

Almost all XML parsers have a way to parse a string instead of a file. With cElementTree, it's just a matter of calling fromstring instead of parse. So, instead of this:

req = requests.get(url)
with tempfile.NamedTemporaryFile() as f:
    f.write(req.content)
    f.seek(0)
    tree = ET.parse(f)

… just do this:

req = requests.get(url)
tree = ET.fromstring(req.content)

Of course the first version only needs to hold the XML document and the parsed tree in memory one after the other, while the second needs to hold them both at once, so this may increase your peak memory usage by about 30%. But this is rarely a problem.

If it is a problem, many XML libraries have a way to feed in data as it arrives, and many downloading libraries have a way to stream data bit by bit—and, as you might imagine, this is again true for cElementTree's XMLParser and for requests in a few different ways. For example:

req = requests.get(url, stream=True)
parser = ET.XMLParser()
for chunk in iter(lambda: req.raw.read(8192), ''):
    parser.feed(chunk)
tree = parser.close()

Not quite as simple as just using fromstring… but it's still simpler than using a temporary file, and probably more efficient to boot.

If that use of the two-argument form of iter confuses you (a lot of people seem to have trouble grasping it at first), you can rewrite it as:

req = requests.get(url, stream=True)
parser = ET.XMLParser()
while True:
    chunk = req.raw.read(8192)
    if not chunk:
        break
    parser.feed(chunk)
tree = parser.close()