Performance of row vs column operations in NumPy

Question

Performance of row vs column operations in NumPy

7.3k views Asked by Amelio Vazquez-Reina At 30 July 2013 at 18:48

There are a few articles that show that MATLAB prefers column operations than row operations, and that depending on you lay out your data the performance can vary significantly. This is apparently because MATLAB uses a column-major order for representing arrays.

I remember reading that Python (NumPy) uses a row-major order. With this, my questions are:

Can one expect a similar difference in performance when working with NumPy?
If the answer to the above is yes, what would be some examples that highlight this difference?

Original Q&A

There are 4 answers

Steve Barnes On 30 July 2013 at 18:57

I suspect it will differ depending on the data and the operations.

The easy answer is to write some tests using the same, real world, data of the sort you are planning on using and the functions that you are planning on using and then use cprofile or timeit to compare the speeds, for your operations, depending on how you structure your data.

gggg On 30 July 2013 at 19:37

In [38]: data = numpy.random.rand(10000,10000)

In [39]: %timeit data.sum(axis=0)
10 loops, best of 3: 86.1 ms per loop

In [40]: %timeit data.sum(axis=1)
10 loops, best of 3: 101 ms per loop

basil_man On 23 December 2021 at 16:59

As other replies pointed out, using numpy functions tend not to have a dramatic performance difference, however if you are doing some sort of manual indexing (which you should typically avoid if all possible), it can matter a lot. Here is a "toy" example to demonstrate this effect:

import numpy as np
from time import time

n = 100
m = n ** 2
x = np.ones((m, m),  dtype="float64")


def row(mat):
    out = 0
    for i in range(n):
        out += np.sum(mat[i, :])
    return out


def col(mat):
    out = 0
    for i in range(n):
        out += np.sum(mat[:, i])

    return out


p = 100
t = time()
for i in range(p):
    s = row(x)
print(time()-t)


t = time()
for i in range(p):
    s = col(x)
print(time()-t)

For 'row()' = 0.2618 sec

For 'col()' = 1.9261 sec

We can see that looping through rows is considerably faster.

**lmjohns3** · Accepted Answer · 2013-07-30T20:11:10+00:00

Like many benchmarks, this really depends on the particulars of the situation. It's true that, by default, numpy creates arrays in C-contiguous (row-major) order, so, in the abstract, operations that scan over columns should be faster than those that scan over rows. However, the shape of the array, the performance of the ALU, and the underlying cache on the processor have a huge impact on the particulars.

For instance, on my MacBook Pro, with a small integer or float array, the times are similar, but a small integer type is significantly slower than the float type:

>>> x = numpy.ones((100, 100), dtype=numpy.uint8)
>>> %timeit x.sum(axis=0)
10000 loops, best of 3: 40.6 us per loop
>>> %timeit x.sum(axis=1)
10000 loops, best of 3: 36.1 us per loop

>>> x = numpy.ones((100, 100), dtype=numpy.float64)
>>> %timeit x.sum(axis=0)
10000 loops, best of 3: 28.8 us per loop
>>> %timeit x.sum(axis=1)
10000 loops, best of 3: 28.8 us per loop

With larger arrays the absolute differences become larger, but at least on my machine are still smaller for the larger datatype:

>>> x = numpy.ones((1000, 1000), dtype=numpy.uint8)
>>> %timeit x.sum(axis=0)
100 loops, best of 3: 2.36 ms per loop
>>> %timeit x.sum(axis=1)
1000 loops, best of 3: 1.9 ms per loop

>>> x = numpy.ones((1000, 1000), dtype=numpy.float64)
>>> %timeit x.sum(axis=0)
100 loops, best of 3: 2.04 ms per loop
>>> %timeit x.sum(axis=1)
1000 loops, best of 3: 1.89 ms per loop

You can tell numpy to create a Fortran-contiguous (column-major) array using the order='F' keyword argument to numpy.asarray, numpy.ones, numpy.zeros, and the like, or by converting an existing array using numpy.asfortranarray. As expected, this ordering swaps the efficiency of the row or column operations:

in [10]: y = numpy.asfortranarray(x)
in [11]: %timeit y.sum(axis=0)
1000 loops, best of 3: 1.89 ms per loop
in [12]: %timeit y.sum(axis=1)
100 loops, best of 3: 2.01 ms per loop

TechQA.

Performance of row vs column operations in NumPy

There are 4 answers

Related Questions in PYTHON

Related Questions in NUMPY

Related Questions in BENCHMARKING

Related Questions in ROW-MAJOR-ORDER

Related Questions in COLUMN-MAJOR-ORDER

Popular Questions

Trending Questions