TarFile.extractfile() as context manager raises 'AttributeError: __enter__'

1.2k views Asked by At

Here is the relevant code:

for file in files:
    with readfile(file) as openfile:
        molecules.append(process_file_fn(openfile))

and I am getting this error from the code above:

src/datamodules/components/edm/process.py", line 92, in process_xyz_files with readfile(file) as openfile: AttributeError: __enter__

Here is the definition of the readfile:

if tarfile.is_tarfile(data):
    tardata = tarfile.open(data, "r")
    files = tardata.getmembers()
    
    def readfile(data_pt):
        return tardata.extractfile(data_pt)

My data is 1234.xyz.tar.bz2

Any insights/suggestions for me is appreciated. Thank you in advance

I tried to define the mode which is read in both the function and the loop but I am met with the same error.

4

There are 4 answers

4
s_pike On

You need to write a class with the __enter__ and __exit__ methods to use a with statement like that.

Have a look at the responses here: Implementing use of 'with object() as f' in custom class in python

If you don't want to implement the context manager you could try changing your for loop to:

for file in files:
    openfile = readfile(file)
    molecules.append(process_file_fn(openfile))
4
Jorge Luis On

I would say you don't need a context manager for your usage case. I think you can write a direct call: openfile = readfile(file).

If you really need a context manager, you can define your function like so:

@contextlib.contextmanager
def readfile(data_pt): yield tardata.extractfile(data_pt)

The decorator contextlib.contextmanager will define the methods __enter__ and __exit__ for you.

0
SIGHUP On

Not sure if this helps but here goes anyway...

You could write your own context manager class that would handle a single tar file. For the sake of simplicity we'll assume that all we want to do is extract a known member of the archive. Thus our class could look like this:

import tarfile

class TarfileHandler:
    def __init__(self, filename):
        self._filename = filename
        self._fd = None
    @property
    def fd(self):
        if self._fd is None:
            self._fd = tarfile.open(self._filename)
        return self._fd
    def extract(self, member):
        try:
            return self.fd.extractfile(member)
        except Exception:
            pass
    def __enter__(self):
        return self
    def __exit__(self, *_):
        if self._fd:
            self._fd.close()
            self._fd = None

Now let's contrive a use-case. We know where the tar file is. We know that it contains 'foo.txt'. We want to extract 'foo.txt' and copy it somewhere.

TARFILE = 'mytarfile.tar'
MEMBER = 'foo.txt'
TARGET = 'foo.txt'

with TarfileHandler(TARFILE) as tfh:
    if data := tfh.extract(MEMBER):
        with open(TARGET, 'wb') as out:
            out.write(data.read())

Hopefully this shows you how to implement a context manager class and how you might adapt it to your needs

1
djvg On

So far, none of the other answers have explained what is actually causing the AttributeError: __enter__.

Cause

What it boils down to is that the OP is trying to use the return value from TarFile.extractfile() as a context manager.

However, the value returned by TarFile.extractfile() can be either an io.BufferedReader or None:

... If member is a regular file or a link, an io.BufferedReader object is returned. For all other existing members, None is returned. ...

See how this is implemented in the tarfile source.

Although the BufferedReader can be used as a context manager, None obviously cannot.

So, the AttributeError: __enter__ will occur, for example, if you call extractfile() on a member that represents a directory.

Solution

A workaround for the OP's example could be to check if the member is actually a file:

for file in files:
    if file.isfile():  # <-- NEW
        with readfile(file) as openfile:
            ...

Note that files = tardata.getmembers(), in the OP's example, implies that file is actually an archive member, in the form of a TarInfo instance. To avoid confusion, I would rename to members and member, respectively.