AttributeError when using python deepcopy

3.6k views Asked by At

I have a class that has __eq__ and __hash__ overridden, to make its objects act as dictionary keys. Each object also carries a dictionary, keyed by other objects of the same class. I get a weird AttributeError when I try to deepcopy the whole structure. I am using Python 3.6.0 on OsX.

From Python docs it looks as if deepcopy uses a memo dictionary to cache the objects it has already copied, so nested structures should not be a problem. What am I doing wrong then? Should I code up my own __deepcopy__ method to work around this? How?

from copy import deepcopy


class Node:

    def __init__(self, p_id):
        self.id = p_id
        self.edge_dict = {}
        self.degree = 0

    def __eq__(self, other):
        return self.id == other.id

    def __hash__(self):
        return hash(self.id)

    def add_edge(self, p_node, p_data):
        if p_node not in self.edge_dict:
            self.edge_dict[p_node] = p_data
            self.degree += 1
            return True
        else:
            return False

if __name__ == '__main__':
    node1 = Node(1)
    node2 = Node(2)
    node1.add_edge(node2, "1->2")
    node2.add_edge(node1, "2->1")
    node1_copy = deepcopy(node1)

File ".../node_test.py", line 15, in __hash__
    return hash(self.id)
AttributeError: 'Node' object has no attribute 'id'
1

There are 1 answers

3
ShadowRanger On BEST ANSWER

Cyclic dependencies are a problem for deepcopy when you:

  1. Have classes that must be hashed and contain reference cycles, and
  2. Don't ensure hash-related (and equality related) invariants are established at object construction, not just initialization

The problem is unpickling an object (deepcopy, by default, copies custom objects by pickling and unpickling, unless a special __deepcopy__ method is defined) creates the empty object without initializing it, then tries to fill in its attributes one by one. When it tries to fill in node1's attributes, it needs to initialize node2, which in turn relies on the partially created node1 (in both cases due to the edge_dict). At the time it's trying to fill in the edge_dict for one Node, the Node it's adding to edge_dict doesn't have its id attribute set yet, so the attempt to hash it fails.

You can correct this by using __new__ to ensure invariants are established prior to initializing mutable, possibly recursive attributes, and defining the pickle helper __getnewargs__ (or __getnewargs_ex__) to make it use them properly. Specifically, change you class definition to:

class Node:
    # __new__ instead of __init__ to establish necessary id invariant
    # You could use both __new__ and __init__, but that's usually more complicated
    # than you really need
    def __new__(cls, p_id):
        self = super().__new__(cls)  # Must explicitly create the new object
        # Aside from explicit construction and return, rest of __new__
        # is same as __init__
        self.id = p_id
        self.edge_dict = {}
        self.degree = 0
        return self  # __new__ returns the new object

    def __getnewargs__(self):
        # Return the arguments that *must* be passed to __new__
        return (self.id,)

    # ... rest of class is unchanged ...

Note: If this is Python 2 code, make sure to explicitly inherit from object and change super() to super(Node, cls) in __new__; the code given is the simpler Python 3 code.

An alternate solution that handles only copy.deepcopy, without supporting pickling or requiring the use of __new__/__getnewargs__ (which require new-style classes) would be to override deepcopying only. You'd define the following extra method on your original class (and make sure the module imports copy), and otherwise leave it untouched:

def __deepcopy__(self, memo):
    # Deepcopy only the id attribute, then construct the new instance and map
    # the id() of the existing copy to the new instance in the memo dictionary
    memo[id(self)] = newself = self.__class__(copy.deepcopy(self.id, memo))
    # Now that memo is populated with a hashable instance, copy the other attributes:
    newself.degree = copy.deepcopy(self.degree, memo)
    # Safe to deepcopy edge_dict now, because backreferences to self will
    # be remapped to newself automatically
    newself.edge_dict = copy.deepcopy(self.edge_dict, memo)
    return newself