In java (or any other PL that uses same hashcode function) Is it safe to say that:
Two strings of equal length have same hashcode if and only if they are equal
let's just assume the hashcode function will be
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
use case:
for comparing two massive collections of strings, it'll be theoretically faster to use option 1
rather than option 2
because hashcode computation will be done once as String will cache the its value:
OPTION 1:
for(String s1 : collection1){
for(String s2 : collection2){
if((s1.hashCode() == s2.hashCode()) && (s1.length()==s2.length()){
System.out.println("matched");
}
}
}
OPTION 2
for(String s1 : collection1){
for(String s2 : collection2){
if(s1.equals(s2)){
System.out.println("matched");
}
}
}
UPDATE:
after @tobias_k comment I realized that statment wrong, So I change the question.
is there a max length M for string that for any two strings of equal length their hashcode will be same if and only if they are equal
Certainly not. Take strings of length 100. There are many more strings of length 100 than there are different
int
numbers, so there must be plenty of collisions.If there is such a length M, then it is at most 1 (and thus not very useful), as shown with the examples of hash code collisions even for strings of length 2 in Eren's and KDP's answers.
To make your comparison faster, you could first compare the hashcode, and then compare with
equals
only if the hashcode is the same.(Note: I have not profiled whether this is really faster than just using
equals
in the first place.)You could also put all the strings from
collection1
into aSet
and then test whether the strings fromcollection2
are in that set. This will basically do the same thing: First compare the hash code, and then useequals
if it finds entries with the same hash.