Why is String.equalsIgnoreCase is so slow

5.8k views Asked by At

I encountered a question in interview to write a method to check for similar words irrespective of character cases.

I answered it by using the difference of ASCII value for each pair of characters. But at home, when I went through the actual implementation of it in String.class, I get disturbed - Why is it implemented that way!

I tried to draw a comparison between inbuilt and my custom method, this way-

public class EqualsIgnoreCase {

    public static void main(String[] args) {
        String str1 = "Srimant @$ Sahu 959s";
        String str2 = "sriMaNt @$ sAhu 959s";

        System.out.println("Avg millisecs with inbuilt () - " + averageOfTenForInbuilt(str1, str2));
        System.out.println("\nAvg millisecs with custom () - " + averageOfTenForCustom(str1, str2));
    }

    public static int averageOfTenForInbuilt(String str1, String str2) {
        int avg = 0;
        for (int itr = 0; itr < 10; itr++) {
            long start1 = System.currentTimeMillis();
            for (int i = 0; i < 100000; i++) {
                str1.equalsIgnoreCase(str2);
            }
            avg += System.currentTimeMillis() - start1;
        }
        return avg / 10;
    }

    public static int averageOfTenForCustom(String str1, String str2) {
        int avg = 0;
        for (int itr = 0; itr < 10; itr++) {
            long start2 = System.currentTimeMillis();
            for (int i = 0; i < 100000; i++) {
                isEqualsIgnoreCase(str1, str2);
            }
            avg += System.currentTimeMillis() - start2;
        }
        return avg / 10;
    }

    public static boolean isEqualsIgnoreCase(String str1, String str2) {
        int length = str1.length();
        if (str2.length() != length) {
            return false;
        }

        for (int i = 0; i < length; i++) {
            char ch1 = str1.charAt(i);
            char ch2 = str2.charAt(i);

            int val = Math.abs(ch1 - ch2);
            if (val != 0) {
                if (isInAlphabetsRange(ch1, ch2)) {
                    if (val != 32) {
                        return false;
                    }
                } else {
                    return false;
                }
            }
        }
        return true;
    }

    public static boolean isInAlphabetsRange(char ch1, char ch2) {
        return (((ch1 <= 122 && ch1 >= 97) || (ch1 <= 90 && ch1 >= 65)) && ((ch2 <= 122 && ch2 >= 97) || (ch2 <= 90 && ch2 >= 65)));
    }

}

Output-

Avg millisecs with inbuilt () - 14

Avg millisecs with custom () - 5

I found that the inbuilt method is hitting efficiency, as because of lots of checks and method calls. Is there any specific reasons behind such an implementation? Or Am I missing something in my logic?

Any suggestions, will be heartily appreciated!

5

There are 5 answers

17
Paul LeBeau On BEST ANSWER

Your routine only handles ASCII characters. The system one handles all unicode characters.

Consider following example:

public class Test {

    public static void main(String[] args) {
        System.out.println((int) 'ě'); // => 283
        System.out.println((int) 'Ě'); // => 282 
    }

}
2
Nicolas Rinaudo On

This might not be the only reason, but the fact that your solution doesn't actually work for all possible strings is definitely a factor.

There are some (annoying) locales for which two characters might have the same upper case but not the same lower case. For this reason, in order to work (most of the time, see Turkish), the canonical implementation must compare the strings char-for-char in both their lower and upper cases.

Your implementation is probably perfect 99% of the time, especially if you only have to deal with the english locale, but the core library implementation can unfortunately not make such assumptions.

7
Am_I_Helpful On

I think that the checking of

String1.equalsIgnoreCase(String2)

the one which is provided has a much better character acceptance and it accepts all kind of Character Values included in the Unicode;but; what you have tried to figure out through your custom code is that you are comparing only the English Alphabetical characters.

So, I think,on the lines of Pavel Horel,the commentator on your post,that due to the sophistication it provides for comparison between all kinds of Unicode characers,it might be taking more time.

3
meriton On

Your method is incorrect in many ways. For instance, it considers "!" equal to "B", "B" equal to "1", but "!" not equal to "1" (so it isn't transitive as we would expect an equals method to be).

Yes, it is quite easy to write an incorrect implementation for that method that is both faster and simpler. A fair challenge would be to write a correct one, i.e. that correctly handles all arguments the JDK implementation does.

You might also wish to look at How do I write a correct micro-benchmark in Java? to get more reliable performance measurements.

0
zbstof On

I think this exerpt from String.java is relevant:

if (ignoreCase) {
    // If characters don't match but case may be ignored,
    // try converting both characters to uppercase.
    // If the results match, then the comparison scan should
    // continue.
    char u1 = Character.toUpperCase(c1);
    char u2 = Character.toUpperCase(c2);
    if (u1 == u2) {
        continue;
    }
    // Unfortunately, conversion to uppercase does not work properly
    // for the Georgian alphabet, which has strange rules about case
    // conversion.  So we need to make one last check before
    // exiting.
    if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
        continue;
    }
}