Broadcasting with concatenate operator

79 views Asked by At

numpy.broadcasting allows to perform basic operations (additions, multiplication, etc.) with arrays of different shapes (under certain conditions on these shapes). For example:

>>> a = np.array([0.0, 10.0, 20.0, 30.0])
>>> b = np.array([1.0, 2.0, 3.0])
>>> a[:, np.newaxis] + b
array([[ 1.,  2.,  3.],
       [11., 12., 13.],
       [21., 22., 23.],
       [31., 32., 33.]])

Each element of the output array is the addition of the elements in the input arrays. This can be described as an 'outer addition operation'.

Is there a simple way to do this with the concatenation operator instead? In other words, is there a simple way to perform an 'outer concatenation operation'? Which means that, with the previous example, the output array should be:

array([[[ 0.,  1.],
        [ 0.,  2.],
        [ 0.,  3.]],

       [[10.,  1.],
        [10.,  2.],
        [10.,  3.]],

       [[20.,  1.],
        [20.,  2.],
        [20.,  3.]],

       [[30.,  1.],
        [30.,  2.],
        [30.,  3.]]])

With few tries, I came up with this solution using numpy.meshgrid but it looks quite under-optimized to me and might be improved:

>>> a = np.array([0.0, 10.0, 20.0, 30.0])
>>> b = np.array([1.0, 2.0, 3.0])
>>> c,d = np.meshgrid(a,b)
>>> np.concatenate((c.T[..., None], d.T[..., None]), axis=-1)
array([[[ 0.,  1.],
        [ 0.,  2.],
        [ 0.,  3.]],

       [[10.,  1.],
        [10.,  2.],
        [10.,  3.]],

       [[20.,  1.],
        [20.,  2.],
        [20.,  3.]],

       [[30.,  1.],
        [30.,  2.],
        [30.,  3.]]])

Is there a more 'pythonic' way to do this?

1

There are 1 answers

2
hpaulj On

I added a fast broadcasted assignment solution at the end.


What does it mean to be more 'pythonic' (or in this case more 'numpy-onic')?

If it runs and produces the desire result, is that enough? 'one liners' are stylish in perl, but not python. broadcasting is a valid concept when using operators like +, but concatenate isn't an operator.

In sense then we are left with comparing speeds. But all speed tests need to be qualified by the size of the arrays.

Anyways for your test case:

In [33]: %%timeit 
    ...: c,d = np.meshgrid(a,b)
    ...: np.concatenate((c.T[..., None], d.T[..., None]), axis=-1)

82.7 µs ± 337 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

The broadcast_arrays approach produces two arrays, similar to meshgrid (actually just their transpose):

In [34]: np.broadcast_arrays(a[:, None], b)
Out[34]: 
[array([[ 0.,  0.,  0.],
        [10., 10., 10.],
        [20., 20., 20.],
        [30., 30., 30.]]),
 array([[1., 2., 3.],
        [1., 2., 3.],
        [1., 2., 3.],
        [1., 2., 3.]])]

Speed is slightly better, but not an 'order of magnitude':

In [35]: timeit np.stack(np.broadcast_arrays(a[:, None], b),axis=2)
77.4 µs ± 2.89 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

But we can construct the same broadcasted arrays with repeat (that's a good compiled numpy method):

In [36]: (a[:,None].repeat(3,axis=1), b[None,:].repeat(4, axis=0))
Out[36]: 
(array([[ 0.,  0.,  0.],
        [10., 10., 10.],
        [20., 20., 20.],
        [30., 30., 30.]]),
 array([[1., 2., 3.],
        [1., 2., 3.],
        [1., 2., 3.],
        [1., 2., 3.]]))

And surprise, it's faster:

In [37]: timeit np.stack((a[:,None].repeat(3,axis=1), b[None,:].repeat(4, axis=0)),axis=2)
29.7 µs ± 99.6 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

So which is more 'pythonic'? For this small example the repeat is faster, even if it is 'wordier'. My guess is that for much larger arrays, the broadcast_arrays will scale better.

broadcasting code

Someone found and down-voted my answer to a previous broadcasting question

what happens under the hood of broadcasting a numpy array

Comments suggest looking at the source code found in np.lib.stride_tricks.

broadcast_array is python code, and uses _broadcast_to. That in turn uses np.nditer. That's known to be slow, at least in the python interface version.

The arrays produced by meshgrid and my repeats are copies. But the arrays produced by broadcast_arrays (Out[34]), are views, created from a and b by playing with the shape and strides. Normally making a view is fast since it doesn't have to copy the base values. But my guess is that the setup code for the stride_tricks functions is time consuming.

In [44]: _34[0].base
Out[44]: array([ 0., 10., 20., 30.])

In [45]: _34[1].base
Out[45]: array([1., 2., 3.])

In [46]: _34[0].strides, _34[1].strides
Out[46]: ((8, 0), (0, 8))

As suggested by the module name, stride_tricks, using this broadcast_arrays function is a nice, cute trick, but not essential to understanding or using numpy, and especially not the normal use of broadcasting with operators.

broadcasted assignment

Here's an even better version. Instead of using concatenate, it makes the target array directly, and fills it in with values from a and b:

In [75]: timeit res=np.empty((4,3,2)); res[:,:,0]=a[:,None]; res[:,:,1]=b
4.56 µs ± 14 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

This does utilize broadcasting, as can be seen with an erronious assignment:

In [76]: res=np.empty((4,3,2)); res[:,:,0]=a
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[76], line 1
----> 1 res=np.empty((4,3,2)); res[:,:,0]=a

ValueError: could not broadcast input array from shape (4,) into shape (4,3)

broadcasting is not just used by binary operators, it is also used during advanced indexing, and as shown here when assigning values to a slice of an array.

Here a (4,1) a can fit the (4,3) slot, and (3,) b can also fit that size slot.