Dill sometimes makes attributes of objects disappear

45 views Asked by At

I have a class 1 (Protpkg) with a custom-defined __new__ method that uses an object of a different custom class 2 (State) as argument and saves it as instance attribute. Then, the initialization (__init__) of this class 1 uses a multiprocess Pool to execute one of its class (1) methods in parallel (three calls to apply_async). Notice the use of multiprocess instead of multiprocessing, so that dill is used and there isn't a problem with class method pickling.

In each of the executions of the class (1) method inside the Pool, I guess that a copy of the namespace is made, meaning that the present instance of the class 1, and also the instance of class 2 that is passed to it and gets saved as attribute, are pickled and unpickled by dill. However, sometimes dill seems to fail at correctly pickling and unpickling the object of class 2 that is passed, and its attributes disappear. Thus, when the class 1 object is attempted to be "reconstructed" and the attributes of the class 2 object passed to it are attempted to be accessed during initialization, an Error arises as these attributes have disappeared:

Process ForkPoolWorker-3:
Traceback (most recent call last):
  File "/home/fnerin/miniconda3/envs/AlloViz/lib/python3.9/site-packages/multiprocess/process.py", line 315, in _bootstrap
    self.run()
  File "/home/fnerin/miniconda3/envs/AlloViz/lib/python3.9/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/fnerin/miniconda3/envs/AlloViz/lib/python3.9/site-packages/multiprocess/pool.py", line 114, in worker
    task = get()
  File "/home/fnerin/miniconda3/envs/AlloViz/lib/python3.9/site-packages/multiprocess/queues.py", line 371, in get
    return _ForkingPickler.loads(res)
  File "/home/fnerin/miniconda3/envs/AlloViz/lib/python3.9/site-packages/dill/_dill.py", line 327, in loads
    return load(file, ignore, **kwds)
  File "/home/fnerin/miniconda3/envs/AlloViz/lib/python3.9/site-packages/dill/_dill.py", line 313, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
  File "/home/fnerin/miniconda3/envs/AlloViz/lib/python3.9/site-packages/dill/_dill.py", line 525, in load
    obj = StockUnpickler.load(self)
  File "/mnt/c/Users/franx/Desktop/TFM/bug/Pkgs.py", line 42, in __new__
    new = super().__new__(cls, state, **kwargs)
  File "/mnt/c/Users/franx/Desktop/TFM/bug/Pkgs.py", line 13, in __new__
    new._pdbf = new.state._pdbf
AttributeError: 'State' object has no attribute '_pdbf'

I have made a GitHub repo with reproducible code: a Python package named bug that has the code of these classes to reproduce the error, and also either a testing notebook or a testing Python script: https://github.com/frannerin/bug. I am using a Linux CentOS 7.5.1804 machine and a Miniconda3 4.11.0 environment with Python 3.9.12 and multiprocess 0.70.12.2 (with dill 0.3.4).

Strangely, to make things more convoluted, I have observed that sometimes the first process sent to the Pool works correctly (the object of class 2 has all its attributes) but the other two fail. Alternatively, if I place time.sleep(5) after each execution of apply_async, all of the processes usually work correctly too.

Even if you don't provide a complete solution with actual code, I am open to ideas to try and to look into myself. Thanks!

0

There are 0 answers