Unexplained behaviour from vectorized partial function using `numpy` and `functools`

61 views Asked by At

I am trying to vectorize a partial function which takes two arguments, both of them lists, then does something to the pairwise elements from the lists (using zip). However, I am finding some unexpected behaviour.

Consider the following code:

import functools
import numpy as np

def f(l1,l2):
    l1 = l1 if isinstance(l1,list) or isinstance(l1,np.ndarray) else [l1]
    l2 = l2 if isinstance(l2,list) or isinstance(l2,np.ndarray) else [l2]
    for e1,e2 in zip(l1,l2):
        print(e1,e2)

f(['a','b'],[1,2])

fp = functools.partial(f,l1=['a','b'])
fp(l2=[1,2])

fv = np.vectorize(fp)
fv(l2=np.array([1,2]))

The output from the Jupyter notebook is as follows:

a 1
b 2

a 1
b 2

a 1
a 1
a 2
array([None, None], dtype=object)

I have two questions:

  • First, the type check at the beginning of f is necessary because np.vectorize seems to automatically fully flatten any input (I get a int32 not iterable exception otherwise). Is there a way to avoid this?
  • Secondly, when the partial function fp is vectorized, clearly the output is not the expected one - I am not sure I understand what NumPy is doing here, including the final empty array output. No matter how much I nest [1,2] within a list, tuple or array the output seems to be always the same. How can I fix my code so that the vectorized function fv behave as expected - that is the same as fp?

Edit
Another try I have done is:

fpv(l2=[np.array([1,2]), np.array([3,4])])

whose output is:

a 1
a 1
a 2
a 3
a 4
2

There are 2 answers

1
Freek Wiekmeijer On

After the changes to isinstance I analyzed further:

import functools
import numpy as np

def f(l1,l2):
    print('raw', l1, l2)
    l1 = l1 if isinstance(l1,list) or isinstance(l1,np.ndarray) else [l1]
    l2 = l2 if isinstance(l2,list) or isinstance(l2,np.ndarray) else [l2]
    print('preprocessed', l1, l2)
    print('zipped:')
    for e1,e2 in zip(l1,l2):
        print(e1,e2)

print('\ntwo lists')
f(['a','b'],[1,2])

print('\nl1 supplied through funcools.partial')
fp = functools.partial(f,l1=['a','b'])
fp(l2=[1,2])

print('\nvectorized')
fv = np.vectorize(fp)
fv(l2=np.array([1,2]))

Output:

two lists
raw ['a', 'b'] [1, 2]
preprocessed ['a', 'b'] [1, 2]
zipped:
a 1
b 2

As expected. Two lists [a, b] and [1, 2] zipped together.

l1 supplied through funcools.partial
raw ['a', 'b'] [1, 2]
preprocessed ['a', 'b'] [1, 2]
zipped:
a 1
b 2

Same as above, functools.partial just wraps the function with two args into a function with one arg injected by functools and one exposed. Same input, same output.

vectorized
raw ['a', 'b'] 1
preprocessed ['a', 'b'] [1]
zipped:
a 1
raw ['a', 'b'] 1
preprocessed ['a', 'b'] [1]
zipped:
a 1
raw ['a', 'b'] 2
preprocessed ['a', 'b'] [2]
zipped:
a 2

This is what I would expect vectorize to do: map the function fp over the members of the input l2.

So I would expect the underlying function calls to f():

f(l1=['a', 'b'], l2=1)
(with expected output from zip(): a 1)
f(l1=['a', 'b'], l2=2)
(with expected output from zip(): a 2)

This is almost what we see happen, except the first call is reapeated twice.

Minimal reproduction scenario:

import numpy as np

np.vectorize(print)(np.array([1,2,3]))

Prints 4 lines: 1, 1, 2, 3.

So the unexpected behaviour is in the ndarray class with np.vectorize; it seems to add a header to the array which is processed like an element.

This problem was also addressed in Why does numpy's vectorize function perform twice on the first element.

Here's the fix:

np.vectorize(fp, otypes=['str'])(l2=np.array([1, 2, 3]))

Specifying the otypes will eliminate the extra calculation over the first element in the vector.

0
hpaulj On

I'm not entirely sure why you are trying to compare partial and vectorize. They have entirely different purposes.

partial just lets us specify one argument ahead of time. It does nothing specific to numpy.

Lets modify the function to display more information about the inputs and the iteration.

In [75]: def f(l1,l2):
    ...:     print('inputs ',l1,l2)
    ...:     l1 = l1 if isinstance(l1,list) or isinstance(l1,np.ndarray) else [l1]
    ...:     l2 = l2 if isinstance(l2,list) or isinstance(l2,np.ndarray) else [l2]
    ...:     for i,(e1,e2) in enumerate(zip(l1,l2)):
    ...:         print(i,e1,e2)
    ...:     return i
    ...:     

And applied to your sample lists:

In [76]: f(['a','b'],[1,2])
inputs  ['a', 'b'] [1, 2]
0 a 1
1 b 2
Out[76]: 1

So it gets 2 list inputs, and iterates on their zipped values.

Putting that in vectorize:

In [77]: fv1 = np.vectorize(f)

In [78]: fv1(['a','b'],[1,2])
inputs  a 1
0 a 1
inputs  a 1
0 a 1
inputs  b 2
0 b 2
Out[78]: array([0, 0])

The inputs are quite different. Instead of 2 lists, f gets called several times, with a pair of scalar values. The first time with a 1 is the trial call used to determine the return dtype. I return a number here, you returned None

Due to your if lines, the scalars are converted to single element lists, and it dones one iteration.

Just for fun, let's make one input a lists of lists,

In [79]: fv1([['a'],['b']],[1,2])
inputs  a 1
0 a 1
inputs  a 1
0 a 1
inputs  a 2
0 a 2
inputs  b 1
0 b 1
inputs  b 2
0 b 2
Out[79]: 
array([[0, 0],
       [0, 0]])

Now f gives called five times, the trial plus a (2,2) pair - but all with scalar inputs.

I was going to add an example using the signature parameter, but don't rememeber enough of the required syntax.

With partial execution is no different than when I provide 2 lists, [76]:

In [83]: fp = functools.partial(f,l1=['a','b'])
    ...: fp(l2=[1,2])
inputs  ['a', 'b'] [1, 2]
0 a 1
1 b 2
Out[83]: 1

With vectorize I could simplify the f to expect scalar values, and just display them:

In [84]: def f(l1,l2):
    ...:     print('inputs ',l1,l2)
    ...:     return 0
    ...: fv2 = np.vectorize(f)    

In [85]: fv2(['a','b'],[1,2])
inputs  a 1
inputs  a 1
inputs  b 2
Out[85]: array([0, 0])

In effect vectorize just replaces the itertion that you put in the origial f. And it doesn't do it any faster (well to be picky, vectorize code does scale a bit better than a list comprehension for large arrays. But it is not a true numpy 'vectorization'. Nothing it compiled.