I have a function x as shown below that takes two numpy arrays as the input and I want to get back a boolean value upon some computation.
import numpy as np
def x(a,b):
print(a)
print(b)
# Some computation...
return boolean_value
wrappedFunc = np.frompyfunc(x,nin=2,nout=1)
arg_a = np.arange(8).reshape(2,4)
# arg_b is a numpy array having shape (2,1)
arg_b = np.array((np.array([[0, 1, 0],
[0, 0, 0],
[1, 0, 0],
[1, 1, 0]]),
np.array([[0., 1., 0.],
[0., 0., 0.],
[1., 0., 0.],
[1., 1., 0.],
[0.5, 0.5, 0.]])), dtype=object).reshape(2, 1)
Executing the code above results in the following output.
# Output of a is:
0
1
2
3
4
5
6
7
# output of b is:
[[0 1 0]
[0 0 0]
[1 0 0]
[1 1 0]]
[[0 1 0]
[0 0 0]
[1 0 0]
[1 1 0]]
[[0 1 0]
[0 0 0]
[1 0 0]
[1 1 0]]
[[0 1 0]
[0 0 0]
[1 0 0]
[1 1 0]]
[[0. 1. 0. ]
[0. 0. 0. ]
[1. 0. 0. ]
[1. 1. 0. ]
[0.5 0.5 0. ]]
[[0. 1. 0. ]
[0. 0. 0. ]
[1. 0. 0. ]
[1. 1. 0. ]
[0.5 0.5 0. ]]
[[0. 1. 0. ]
[0. 0. 0. ]
[1. 0. 0. ]
[1. 1. 0. ]
[0.5 0.5 0. ]]
[[0. 1. 0. ]
[0. 0. 0. ]
[1. 0. 0. ]
[1. 1. 0. ]
[0.5 0.5 0. ]]
As you can see the variables a
and b
are printed 8 times respectively, this is not the intended behaviour as I expected to see the output of the print statements for a
and b
twice respectively. The expected output from print(a)
and print(b)
statements is shown below:
On first call:
a needs to be:[0,1,2,3]
b needs to be:[[0 1 0]
[0 0 0]
[1 0 0]
[1 1 0]]
On second call:
a needs to be:[4,5,6,7]
b needs to be:[[0. 1. 0. ]
[0. 0. 0. ]
[1. 0. 0. ]
[1. 1. 0. ]
[0.5 0.5 0. ]]
What am I doing wrong here?
Let's look at
frompyfunc
with a simplerb
, and compare it to straightforwardnumpy
addition.The addition of a (2,4) with a (2,1) yields a (2,4). By the rules of
broadcasting
the size 1 dimension is 'replicated' to match the 4 ofa
:Define a function that simply adds two 'scalars'. As written it works with arrays, including
a
andb
, but imagine having someif
lines that only work with scalars.Using
frompyfunc
to make aufunc
that canbroadcast
its arguments, passing scalar values tox
:What you seem to want is
zip
of the arrays on their first dimension:Note that
x
here gets a (4,) and (1,) shaped arrays, which, again bybroadcasting
, yield a (4,) result.Those 2 output arrays can be joined to make the same (4,2) as before:
A related function,
vectorize
takes asignature
that allows us to specify itertion on the first axis. Getting that right can take some practice (though I got it right on the first try!):vectorize
has a performance disclaimer, and that applies doubly so to thesignature
version.frompyfunc
generally performs better (when it does what we want).For small arrays, list comprehension usually does better, however for large arrays,
vectorize
seems to scale better, and ends up with a modest speed advantage. But to get the bestnumpy
performance it's best to work with the whole arrays (true vectorization), without any of this 'iteration'.