'ValueError: numpy.ndarray size changed, may indicate binary incompatibility' - but 2nd attempt succeeds

1.9k views Asked by At

I am getting the following error when I deserialize a causalml (0.10.0) model in linux (x86-64), that has been serialized on os-x (darwin):

ValueError: numpy.ndarray size changed, may indicate binary incompatibility.

Unexpectedly, trying to deserializing it again in the same python session does succeed!

The environment

On the serializing machine:

  • Python 3.8, in a poetry .venv
  • numpy 1.18.5 (the latest version compatible with causalml 0.10.0)
  • os-x

On the deserializing machine:

  • docker based on AWS lambda python 3.8
  • Python 3.8
  • linux x86_64

Both have cython version 0.28, causalml version 0.10.0.

With cython version 0.29.26 (compatible according to pip), rerunning does not succeed.

The error gets raised in the causaltree.cpython-38-x86_64-linux-gnu.so .

Joblib or Pickle

I tried both python's pickle, and joblib, both raise the error.

In the case of using joblib, the following stacktrace occurs:

File "/var/task/joblib/numpy_pickle.py", line 577, in load
    obj = _unpickle(fobj)
  File "/var/task/joblib/numpy_pickle.py", line 506, in _unpickle
    obj = unpickler.load()
  File "/var/lang/lib/python3.8/pickle.py", line 1212, in load
    dispatch[key[0]](self)
  File "/var/lang/lib/python3.8/pickle.py", line 1537, in load_stack_global
    self.append(self.find_class(module, name))
  File "/var/lang/lib/python3.8/pickle.py", line 1579, in find_class
    __import__(module, level=0)
  File "/var/task/causalml/inference/tree/__init__.py", line 3, in <module>
    from  .causaltree import CausalMSE, CausalTreeRegressor
  File "__init__.pxd", line 238, in init causalml.inference.tree.causaltree

Using a more recent python version

Other answers mention that upgrading (on the deserializing environment) to a more recent numpy, which should be backwards compatible, could help. In my case it did not help.

After installing causalml, I separately pip3 install --upgrade numpy==XXX to replace the numpy version on the deserializing machine.

  • With both numpy 1.18.5 and 1.19.5, the error mentions: Expected 96 from C header, got 80 from PyObject
  • With numpy 1.20.3, the error mentions: Expected 96 from C header, got 88 from PyObject

Can other numpy arrays be serialized & deserialized?: Yes

To verify if numpy serialization & deserialization is actually possible, I've tested serializing a random array (both with pickle and joblib:

with open(str(path / "numpy.pkl"), "wb") as f:
    pickle.dump(object, f, protocol=5)

with open(str(path / "numpy.joblib"), "wb") as f:
    joblib.dump(object, f, compress=True)

These actually deserialize without errors:

with open(str(path / "numpy.pkl"), "rb") as f:
    read_object = pickle.load(f)

with open(str(path / "numpy.pkl"), "rb") as f:
    read_object = joblib.load(f)

Numpy source

If I look at the source code of numpy at this line it seems the error only gets raised when the retrieved size is bigger than the expected size.

Some other (older) stackoverflow answers mention that the warnings can be silenced as follows. But didn't help neither:

import warnings;
warnings.filterwarnings("ignore", message="numpy.dtype size changed");
warnings.filterwarnings("ignore", message="numpy.ufunc size changed");

Trying twice solves it I found one way to solve this: in the same python session, load the serialized model twice. The first time raises the error, the second time it does not.

The loaded model then does behave as expected.

What is happening? And is there a way to make it succeed the first time?

0

There are 0 answers