String comparison using hashcode() and length()

1.3k views Asked by At

In java (or any other PL that uses same hashcode function) Is it safe to say that:

Two strings of equal length have same hashcode if and only if they are equal

let's just assume the hashcode function will be

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

use case:

for comparing two massive collections of strings, it'll be theoretically faster to use option 1 rather than option 2 because hashcode computation will be done once as String will cache the its value:

OPTION 1:

 for(String s1 : collection1){
   for(String s2 : collection2){
     if((s1.hashCode() == s2.hashCode()) && (s1.length()==s2.length()){
        System.out.println("matched");
     }
   }
 }

OPTION 2

 for(String s1 : collection1){
   for(String s2 : collection2){
     if(s1.equals(s2)){
        System.out.println("matched");
     }
   }
 }

UPDATE:

after @tobias_k comment I realized that statment wrong, So I change the question.

is there a max length M for string that for any two strings of equal length their hashcode will be same if and only if they are equal

4

There are 4 answers

2
tobias_k On BEST ANSWER

Two strings of equal length have same hashcode if and only if they are equal

Certainly not. Take strings of length 100. There are many more strings of length 100 than there are different int numbers, so there must be plenty of collisions.

is there a max length M for string that for any two strings of equal length their hashcode will be same if and only if they are equal

If there is such a length M, then it is at most 1 (and thus not very useful), as shown with the examples of hash code collisions even for strings of length 2 in Eren's and KDP's answers.


To make your comparison faster, you could first compare the hashcode, and then compare with equals only if the hashcode is the same.

for(String s1 : collection1){
    for(String s2 : collection2){
        if (s1.hashCode() == s2.hashCode() && s1.equals(s2)) {
            System.out.println("matched");
        }
    }
}

(Note: I have not profiled whether this is really faster than just using equals in the first place.)

You could also put all the strings from collection1 into a Set and then test whether the strings from collection2 are in that set. This will basically do the same thing: First compare the hash code, and then use equals if it finds entries with the same hash.

Set<String> setFromCollection1 = new HashSet<>(collection1);
for (String s : collection2) {
    if (setFromCollection1.contains(s)) {
        System.out.println("matched");
    }
}
2
Eran On

No, that's wrong.

For example :

System.out.println ("ab " + "ab".hashCode());
System.out.println ("bC " + "bC".hashCode());

Outputs :

ab 3105
bC 3105

Equal hashCode doesn't mean equal Strings, even for Strings of the same length.

1
KDP On

Two strings of equal length have same hashcode if and only if they are equal - NOT NECESSARILY.

check this "FB" and "Ea" are of same length and has same hashcode but they are not equal.

    String s = new String("FB");
    String s1 = new String("Ea");
    System.out.println(s.hashCode()); //2236
    System.out.println(s1.hashCode()); //2236
    System.out.println(s.hashCode()==s1.hashCode()); //true
    System.out.println(s.equals(s1)); //false
0
Ashkrit Sharma On

If you are looking for speed and you matching is going to happen only once then below option is best and it is used by map implementation in java also

if (value1.hashCode() == value2.hashCode() && value1.equals(value2)) {
            System.out.println("matched!");
        }

but if you want to do matching multiple times then you should look for better algo for matching because java implementation is nave http://www.javacodegeeks.com/2010/09/string-performance-exact-string.html post has nice summary of String matching algorithm performance.