For more setup, see this question. I want to create lots of instances of class Toy
, in parallel. Then I want to write them to an xml tree.
import itertools
import pandas as pd
import lxml.etree as et
import numpy as np
import sys
import multiprocessing as mp
def make_toys(df):
l = []
for index, row in df.iterrows():
toys = [Toy(row) for _ in range(row['number'])]
l += [x for x in toys if x is not None]
return l
class Toy(object):
def __new__(cls, *args, **kwargs):
if np.random.uniform() <= 1:
return super(Toy, cls).__new__(cls, *args, **kwargs)
def __init__(self, row):
self.id = None
self.type = row['type']
def set_id(self, x):
self.id = x
def write(self, tree):
et.SubElement(tree, "toy", attrib={'id': str(self.id), 'type': self.type})
if __name__ == "__main__":
table = pd.DataFrame({
'type': ['a', 'b', 'c', 'd'],
'number': [5, 4, 3, 10]})
n_cores = 2
split_df = np.array_split(table, n_cores)
p = mp.Pool(n_cores)
pool_results = p.map(make_toys, split_df)
p.close()
p.join()
l = [a for L in pool_results for a in L]
box = et.Element("box")
box_file = et.ElementTree(box)
for i, toy in itertools.izip(range(len(l)), l):
Toy.set_id(toy, i)
[Toy.write(x, box) for x in l]
box_file.write(sys.stdout, pretty_print=True)
This code runs beautifully. But I redefined the __new__
method to only have a random chance of instantiating a class. So if I set if np.random.uniform() < 0.5
, I want to create half as many instances as I asked for, randomly determined. Doing this returns the following error:
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib/python2.7/multiprocessing/pool.py", line 380, in _handle_results
task = get()
AttributeError: 'NoneType' object has no attribute '__dict__'
I don't know what this even means, or how to avoid it. If I do this process monolithically, as in l = make_toys(table)
, it runs well for any random chance.
Another solution
By the way, I know that this can be solved by leaving the __new__
method alone and instead rewriting make_toys()
as
def make_toys(df):
l = []
for index, row in df.iterrows():
prob = np.random.binomial(row['number'], 0.1)
toys = [Toy(row) for _ in range(prob)]
l += [x for x in toys if x is not None]
return l
But I'm trying to learn about the error.
I think you've uncovered a surprising "gotcha" caused by
Toy
instances becomingNone
as they are passed through the multiprocessing Pool's resultQueue
.The
multiprocessing.Pool
usesQueue.Queue
s to pass results from the subprocesses back to the main process.Per the docs:
While the actual serialization might be different, in spirit the pickling of an instance of
Toy
becomes a stream of bytes such as this:Notice that the module and class of the object is mentioned in the stream of bytes:
__main__\nToy
.The class itself is not pickled. There is only a reference to the name of the class.
When the stream of bytes is unpickled on the other side of the pipe,
Toy.__new__
is called to instantiate a new instance ofToy
. The new object's__dict__
is then reconstituted using unpickled data from the byte stream. When the new object isNone
, it has no__dict__
attribute, and hence the AttributeError is raised.Thus, as a
Toy
instance is passed through theQueue
, it might becomeNone
on the other side.I believe this is the reason why using
leads to
If you add logging to your script,
you will find that the
AttributeError
only occurs after a logging message of the formNotice that the logging message comes from the MainProcess, not one of the PoolWorker processes. Since the
Returning None
message comes fromToy.__new__
, this shows thatToy.__new__
was called by the main process. This corroborates the claim that unpickling is callingToy.__new__
and transforming instances ofToy
intoNone
.The moral of the story is that for
Toy
instances to be passed through a multiprocessing Pool's Queue,Toy.__new__
must always return an instance ofToy
. And as you noted, the code can be fixed by instantiating only the desired number of Toys inmake_toys
:By the way, it is non-standard to call instance methods with
Toy.write(x, box)
whenx
is an instance ofToy
. The preferred way is to useSimilary, use
toy.set_id(i)
instead ofToy.set_id(toy, i)
.