Most efficient way to print differences of two arrays?

64 views Asked by At

Recently, a colleague of mine asked me how he could test the equalness of two arrays. He had two sources of Address and wanted to assert that both sources contained exactly the same elements, although order didn't matter. Both using Array or like List in Java, or IList would be okay, but since there could be two equal Address objects, things like Sets can't be used.

In most programming languages, a List already has an equals method doing the comparison (assuming that the collection was ordered before doing it), but there is no information about the actual differences; only that there are some, or none.

The output should inform about elements that are in one collection but not in the other, and vice-versa.

An obvious approach would be to iterate through one of the collections (if one of them is), and just call contains(element) on the other one, and doing it the the other way around afterwards. Assuming a complexity of O(n) for contains, that would result in O(2n²), if I'm correct.

Is there a more efficient way for getting the information "A1 and A2 isn't in List1, A3 and A4 isn't in List2"? Are there data structures better suited for doing this job than lists? Is it worth it to sort the collections before and using a custom, binary search contains?

2

There are 2 answers

5
Davide On

The first thing that comes to mind is using set difference

In pseudo-python

addr1 = set(originalAddr1)
addr2 = set(originalAddr2)
in1notin2 = addr1 - addr2
in2notin1 = addr2 - addr1
allDifferences = in1notin2 + in2notin1

From here you can see that set difference is O(len(set)) and union is O(len(set1) + len(set2)) giving you a linear time solution with this python specific set implementation, instead of quadratic as you suggest.

I believe other popular languages tend to implement these type of data structures pretty much the same way, but can't really be sure about this.

0
BeyelerStudios On

Is it worth to sort the collection [...]?

Compare the naive approach O(n²) to sorting two lists in O(n logn) and then comparing them in O(n) - or sorting one list in O(n logn) and iterating over the other in O(n)