Efficient multiple precision numerical arrays

1k views Asked by At

Numpy is a library for efficient numerical arrays.

mpmath, when backed by gmpy, is a library for efficient multiprecision numbers.

How do I put them together efficiently? Or is it already efficient to just use a Numpy array with mpmath numbers?

It doesn't make sense to ask for "as efficient as native floats", but you can ask for it to be close to the efficiency of equivalent C code (or, failing that, Java/C# code). In particular, an efficient array of multi-precision numbers would mean that you can do vectorized operations and not have to look up, say, __add__ a million times in the Global Interpreter.

Edit: To the close voter: My question is about an efficient way of putting them together. The answer in the possible duplicate specifically points out that the naive approach is not efficient.

Having a numpy array of dtype=object can be a liitle misleading, because the powerful numpy machinery that makes operations with the standard dtypes super fast, is now taken care of by the default object's python operators, which means that the speed will not be there anymore

3

There are 3 answers

0
casevh On BEST ANSWER

Disclaimer: I maintain gmpy2. The following tests were performed with the development version.

a and b are 1000 element lists containing pseudo-random gmpy2.mpfr values with 250 bits of of precision. The test performs element-wise multiplication of the two lists.

The first test uses a list comprehension:

%timeit [x*y for x,y in zip(a,b)]
1000 loops, best of 3: 322 µs per loop

The second test uses the map function to perform the looping:

%timeit list(map(gmpy2.mul, a, b))
1000 loops, best of 3: 299 µs per loop

The third test is a C implementation of the list comprehension:

%timeit vector2(a,b)
1000 loops, best of 3: 243 µs per loop

In the third attempt, vector2 tries to be a well-behaved Python function. Numeric types are handled using gmpy2's type conversion rules, error-checking is done, etc. The context settings are checked, subnormal numbers are created if requested, exceptions are raised if needed, etc. If you ignore all the Python enhancements and assume all the value are already gmpy2.mpfr, I was able to get the time down on the fourth attempt:

%timeit vector2(a,b)
10000 loops, best of 3: 200 µs per loop

The fourth version doesn't do enough error checking to be of general use but a version between the third and fourth attempts may be possible.

It is possible to decrease Python overhead, but as the precision increases, the effective savings decreases.

0
Thomas Baruchel On

A current project is qd which will be able to embed high-precision numbers in Numpy arrays by taking benefit of the fixed-size in memory of its values. Right now, the type is available for Numpy but not yet as a dtype; you can already use it with an object dtype however.

(If you want to see what the dtype will look like, you may already uncomment the relevant line for compiling it with the Numpy support; it should work for having a glance but no function has been implemented yet; next release should be in september or october.)

5
ali_m On

As far as I'm aware, there is no existing Python library that supports vectorized array operations on multiple precision values. There's unfortunately no particularly efficient way to use multiple precision values within a numpy ndarray, and it's extremely unlikely that there ever will be, since multiple precision values are incompatible with numpy's basic array model.

Each element in a floating point numpy ndarray takes up the same number of bytes, so the array can be represented in terms of the memory address of the first element, the dimensions, and a regular byte offset (or stride) between consecutive array elements.

This scheme has significant performance benefits - adjacent array elements are located at adjacent memory addresses, so sequential reads/writes to an array benefit from better locality of reference. Striding is also very important for usability, since it allows you to do things like operating on views of the same array without creating new copies in memory. When you do x[::2], you are really just doubling the stride over the first axis of the array, such that you address every other element.

By contrast, an array containing multiple precision values would have to contain elements of unequal size, since higher precision values would take up more bytes than low-precision values. A multiple precision array therefore cannot be regularly strided, and loses out on the benefits mentioned above.

In addition to the problems with constructing arrays, even plain arithmetic on multiple precision scalars is likely to be much slower than for floating point scalars. Almost all modern processors have specialized floating point units, whereas multiple precision arithmetic must be implemented in software rather than hardware.

I suspect that these performance issues might be a big part of the reason why there isn't already a Python library that provides the functionality you're looking for.