Recently, a colleague of mine asked me how he could test the equalness of two arrays. He had two sources of Address
and wanted to assert that both sources contained exactly the same elements, although order didn't matter.
Both using Array
or like List
in Java, or IList
would be okay, but since there could be two equal Address
objects, things like Set
s can't be used.
In most programming languages, a List
already has an equals
method doing the comparison (assuming that the collection was ordered before doing it), but there is no information about the actual differences; only that there are some, or none.
The output should inform about elements that are in one collection but not in the other, and vice-versa.
An obvious approach would be to iterate through one of the collections (if one of them is), and just call contains(element)
on the other one, and doing it the the other way around afterwards. Assuming a complexity of O(n)
for contains
, that would result in O(2n²)
, if I'm correct.
Is there a more efficient way for getting the information "A1 and A2 isn't in List1, A3 and A4 isn't in List2"? Are there data structures better suited for doing this job than lists? Is it worth it to sort the collections before and using a custom, binary search contains?
The first thing that comes to mind is using set difference
In pseudo-python
From here you can see that set difference is
O(len(set))
and union isO(len(set1) + len(set2))
giving you a linear time solution with this python specificset
implementation, instead of quadratic as you suggest.I believe other popular languages tend to implement these type of data structures pretty much the same way, but can't really be sure about this.