I am getting the following error when I deserialize a causalml (0.10.0) model in linux (x86-64), that has been serialized on os-x (darwin):
ValueError: numpy.ndarray size changed, may indicate binary incompatibility.
Unexpectedly, trying to deserializing it again in the same python session does succeed!
The environment
On the serializing machine:
- Python 3.8, in a poetry .venv
- numpy
1.18.5
(the latest version compatible with causalml 0.10.0) - os-x
On the deserializing machine:
- docker based on AWS lambda python 3.8
- Python 3.8
- linux x86_64
Both have cython
version 0.28
, causalml
version 0.10.0
.
With cython
version 0.29.26
(compatible according to pip), rerunning does not succeed.
The error gets raised in the causaltree.cpython-38-x86_64-linux-gnu.so
.
Joblib or Pickle
I tried both python's pickle
, and joblib
, both raise the error.
In the case of using joblib, the following stacktrace occurs:
File "/var/task/joblib/numpy_pickle.py", line 577, in load
obj = _unpickle(fobj)
File "/var/task/joblib/numpy_pickle.py", line 506, in _unpickle
obj = unpickler.load()
File "/var/lang/lib/python3.8/pickle.py", line 1212, in load
dispatch[key[0]](self)
File "/var/lang/lib/python3.8/pickle.py", line 1537, in load_stack_global
self.append(self.find_class(module, name))
File "/var/lang/lib/python3.8/pickle.py", line 1579, in find_class
__import__(module, level=0)
File "/var/task/causalml/inference/tree/__init__.py", line 3, in <module>
from .causaltree import CausalMSE, CausalTreeRegressor
File "__init__.pxd", line 238, in init causalml.inference.tree.causaltree
Using a more recent python version
Other answers mention that upgrading (on the deserializing environment) to a more recent numpy, which should be backwards compatible, could help. In my case it did not help.
After installing causalml, I separately pip3 install --upgrade numpy==XXX
to replace the numpy version on the deserializing machine.
- With both numpy
1.18.5
and1.19.5
, the error mentions:Expected 96 from C header, got 80 from PyObject
- With numpy
1.20.3
, the error mentions:Expected 96 from C header, got 88 from PyObject
Can other numpy arrays be serialized & deserialized?: Yes
To verify if numpy serialization & deserialization is actually possible, I've tested serializing a random array (both with pickle
and joblib
:
with open(str(path / "numpy.pkl"), "wb") as f:
pickle.dump(object, f, protocol=5)
with open(str(path / "numpy.joblib"), "wb") as f:
joblib.dump(object, f, compress=True)
These actually deserialize without errors:
with open(str(path / "numpy.pkl"), "rb") as f:
read_object = pickle.load(f)
with open(str(path / "numpy.pkl"), "rb") as f:
read_object = joblib.load(f)
Numpy source
If I look at the source code of numpy
at this line it seems the error only gets raised when the retrieved size is bigger than the expected size.
Some other (older) stackoverflow answers mention that the warnings can be silenced as follows. But didn't help neither:
import warnings;
warnings.filterwarnings("ignore", message="numpy.dtype size changed");
warnings.filterwarnings("ignore", message="numpy.ufunc size changed");
Trying twice solves it I found one way to solve this: in the same python session, load the serialized model twice. The first time raises the error, the second time it does not.
The loaded model then does behave as expected.
What is happening? And is there a way to make it succeed the first time?