Approximate equality of unordered set of complex floats

243 views Asked by At

I want to do nose testing of numpy arrays of complex floats that are unordered.

So for instance, if

a = [1+1j, 1-1j, 2+2j, 2-2j, 2+2j, 2-2j]

and

b = [2+2j, 2-2j, 1+1j, 1.000000000000001-1j, 2+2j, 2-2j]

the assert should succeed, as they have approximately the same values the same number of times. Order doesn't matter.

For regular floats, assert_array_almost_equal(np.sort(a), np.sort(b)) would be fine, but that doesn't work here because it sorts by real part first and then imaginary part, so because of the float error, they are sorted to:

a: [ 1.-1.j,  1.+1.j,  2.-2.j,  2.-2.j,  2.+2.j,  2.+2.j]    
b: [ 1.+1.j,  1.-1.j,  2.-2.j,  2.-2.j,  2.+2.j,  2.+2.j]

Is there a built-in way to do this? If not, I guess I could resort to something like cplxreal, but that seems like a lot to add to a testing file.

4

There are 4 answers

1
jakevdp On BEST ANSWER

I encountered this problem recently and realized that it can be accomplished quite efficiently by checking whether the structural rank of the matrix of pairwise comparisons is equal to the size of the inputs.

According to scipy.csgraph.structural_rank "A graph has full structural rank if it is possible to permute the elements to make the diagonal zero-free." This is exactly the condition we want: we want to check if the pairwise comparison can be permuted such that the result has a fully nonzero diagonal.

You can implement it like this:

import numpy as np
from scipy.sparse import csgraph, csr_matrix

def sets_approx_equal(x, y, **kwds):
  """
  Check if x and y are permutations of the same approximate set of values.
  """
  x = np.asarray(x)
  y = np.asarray(y)
  assert x.ndim == y.ndim == 1
  if len(x) != len(y):
    return False
  mat = np.isclose(x[:, None], y[None, :], **kwds)
  rank = csgraph.structural_rank(csr_matrix(mat))
  return rank == len(y)

Testing on your example inputs gives the expected results:

a = [1+1j, 1-1j, 2+2j, 2-2j, 2+2j, 2-2j]
b = [2+2j, 2-2j, 1+1j, 1.000000000000001-1j, 2+2j, 2-2j]
print(sets_approx_equal(a, b))
# True

b[-1] = 1 - 1j
print(sets_approx_equal(a, b))
# False

To convince ourselves further that this is working, we can generate a large number of known equal and unequal permutations, and check that the function is returning the expected result for these cases:

def make_permutations(rng, N, equal=True, eps=1E-10):
  """Make two permuted arrays of complex values.

  If equal=True, they are derived from the same initial set of values.
  If equal=False, they are not
  """
  x = rng.randn(N) + 1j * rng.randn(N)
  y = x if equal else rng.randn(N) + 1j * rng.randn(N)
  x = x + eps * (rng.randn(N) + 1j * rng.randn(N))
  y = y + eps * (rng.randn(N) + 1j * rng.randn(N))
  return rng.permutation(x), rng.permutation(y)

rng = np.random.RandomState(0)
for i in range(1000):
  x, y = make_permutations(rng, 5, equal=True)
  assert sets_approx_equal(x, y)

  x, y = make_permutations(rng, 5, equal=False)
  assert not sets_approx_equal(x, y)

Indeed, all the assertions pass.

3
Slater Victoroff On

I'm not sure if there's any way to do this in a better than O(n^2) worst case, but if that's acceptable to you, you could just copy one of the lists and use a modified equals function with elimination to see if they match.

def equals(a, b, tolerance):
    return abs(a-b) < tolerance

and then iterate through your list, removing matches as you find them

def equivalent_lists(a, b, tolerance):
    new_b = b[:]
    for a_item in a:
        truths = [equals(a_item, b_item, tolerance) for b_item in new_b]
        if not any(truths):
            return False
        else:
            new_b.pop(truths.index(True))
    return not bool(new_b)

Seems to work in your initial case, in at least a cursory way:

a = [1+1j, 1-1j, 2+2j, 2-2j, 2+2j, 2-2j]
b = [2+2j, 2-2j, 1+1j, 1.000000000000001-1j, 2+2j, 2-2j]
c = [2+2j, 2-2j, 1+1j, 2-1j, 2+2j, 2-2j]

equivalent_lists(a, b, 0.0001)
>>> True
equivalent_lists(a, c, 0.0001)
>>> False

Not the most beautiful solution, but at least seems to work in a pretty transparent way.

0
hpaulj On

Another approach might be to think in terms of points in a 2d space. a is a set of points. b is another set. Each point in b should be close to a point in a.

diff = np.array(a)[None,:]-np.array(b)[:,None]
X = np.round(diff*diff.conjugate())==0  # round to desired error tollerance
X

array([[False, False,  True, False,  True, False],
       [False, False, False,  True, False,  True],
       [ True, False, False, False, False, False],
       [False,  True, False, False, False, False],
       [False, False,  True, False,  True, False],
       [False, False, False,  True, False,  True]], dtype=bool)

The next step is to test X. If values in a (and b) are distinct, there should be one True for each column and each row. But here there are duplicates.

In [29]: I,J=np.where(X)

In [30]: I
Out[30]: array([0, 0, 1, 1, 2, 3, 4, 4, 5, 5])

In [31]: J
Out[31]: array([2, 4, 3, 5, 0, 1, 2, 4, 3, 5])

set(I) and set(J) are both [0,1,..5], so all rows and columns have a match. This set test might be enough, even with duplicates. I'd have to explore other combinations of points and mismatches.

This answer is incomplete, but might provide some useful ideas.

5
hpaulj On

How about sorting the arrays by their magnitude?

def foo(a):
    return a[np.argsort(a*a.conjugate())]
np.testing.assert_array_almost_equal(foo(a),foo(b))