Uniquely identify an array of numbers whose order doesn't matter

586 views Asked by At

Let's say I have these 3 integer arrays:

int[] a = {1, 2, 3};
int[] b = {3, 4, 5};
int[] c = (2, 1, 3};

I'm looking for the most efficient code that will consider a the same as c (because they contain the same numbers but in a different order), but consider a different from b, and b different from c.

I know I can sort them all so that c becomes {1, 2, 3} and therefore the same as a, but I'm comparing hundreds of arrays with more than three numbers each and I don't want my program to sort each one of them, I'm thinking there must be a better way.

Also, taking the sum, for example, wouldn't work because the sum of numbers in {1, 4, 5} is the same as that of numbers in {1, 3, 6}.

And the product wouldn't work either because the product of numbers in {1, 2, 6} is the same as that of numbers in {1, 3, 4}.

3

There are 3 answers

2
Mureinik On BEST ANSWER

Sorting is an O(nlog(n) operation (in the worst case). You could, instead, have an O(n) solution by running over both arrays and just counting the elements in it:

public static boolean hasSameElements(int[] a, int[] b) {
    return countElements(a).equals(countElements(b);)
}

private static Map<Integer, Long> countElements(int[] arr) {
    return Arrays.stream(arr)
                 .boxed()
                 .collect(Collectors.groupingBy(Function.identity(), 
                          Collectors.counting()));
}

EDIT:
While it won't change the big-O notation of the algorithm, a slightly less elegant solution could perform better for non-matching arrays by failing fast:

public static boolean hasSameElements(int[] a, int[] b) {
    if (a.length != b.length) {
        return false;
    }

    Map<Integer, Long> popCount =
            Arrays.stream(a)
                  .mapToObj(Integer::valueOf)
                  .collect(Collectors.groupingBy(Function.identity(), 
                           Collectors.counting()));

    for (int elem : b) {
        Long count = popCount.get(elem);
        if (count == null) {
            return false;
        }
        count--;
        if (count == 0L) {
            popCount.remove(elem);
        } else {
            popCount.put(elem, count);
        }
    }

    return popCount.isEmpty();
}
1
Roberto Attias On

the idea of reducing all the numbers into the array to a single number for comparison is on the right track. This approach is called hashing. A good property of hashing is to avoid mapping multiple sets of inputs to the same output as much as possible. Addition and Multiplication are very poor hashing functions, and there is a large literature on good hashing functions. In java, you can use the Arrays.hashCode(int[]) function to get a good hashing of the array.

Of course, despite this hashing having good quality, there is no guarantee two arrays won't produce the same value. So, here is how you can implement a comparison function:

boolean equals(int[] a, int[]b) {
  if (a.length != b.length)
    return false;
  if (a == b)
    return true;
  if (Arrays.hashCode(a) != Arrays.hashCode(b)) 
    return false;
  for(int i=0; i< a.length; i++) {
    if (a[i] != b[i])
      return false;
  return true;
}

Note that the element-by-element comparison is executed only if the two parameters are not pointing to the same array, and the hashing is the same.

EDIT: it was pointed out to me that the Arrays.hashCode may produce different results for arrays with the same values in different orders, which is true, and in fact I was ignoring the ordering. Sorting the arrays first is O(MNlogN) in the average case (with M being the number of arrays and N the average length). However, if the number of repetition in the array is small the avg complexity is closer to O(MN) as sorting is needed only for arrays with the same length and hash code, which is very unlikely.

boolean equals(int[] a, int[]b) {
  if (a.length != b.length)
    return false;
  if (a == b)
    return true;
  if (hashCode(a) != hashCode(b)) 
    return false;
  Arrays.sort(a);
  Arrays.sort(b);
  for(int i=0; i< a.length; i++) {
    if (a[i] != b[i])
      return false;
  return true;
}

private int hashCode(int[] a) {
  int res = 0;
  for(int i=0; i<a.length; i++) {
    res ^= a[i]
  }
  return res;
}
3
rossum On

I assume that the arrays do not have any internal duplicates like, {1, 2, 2, 4}. First check that the lengths are the same. If they are the same length, then create a Set from the first array. Add the elements from the second array, A2, to the set one at a time. As each element from A2 is added, check if it was added and not rejected as a duplicate. If any element from A2 is not a duplicate, then the two arrays are not identical. If all elements from A2 are rejected as duplicates then the two arrays are identical in size and contents, though not necessarily in order.