Trim() vs IndexOf()

484 views Asked by At

I am parsing 100 of files which contains 1000 of lines in it.

I have to check whether line starts with some keywords.

i have 2 options not sure which to consider.

option 1:

    String[] keywordsArr = { "Everything", "Think", "Result", "What", "#Shop",  "#Cure" };
    for (int i = 0; i < linesOfCode.length; i++) {

        for (String keyWord : keywordsEndingAtEndOfLogicalLine) {

            if (linesOfCode[i].indexOf(keyWord) > -1) {

                if (linesOfCode[i].trim().startsWith(keyWord)) {

                    linesOfCode[i] = "";
                    break;
                }
            }
        }
    }

option 2:

String[] keywordsArr = { "Everything", "Think", "Result", "What", "#Shop",  "#Cure" };
    for (int i = 0; i < linesOfCode.length; i++) {

        for (String keyWord : keywordsArr) {

            if (linesOfCode[i].trim().startsWith(keyWord)) {

                    linesOfCode[i] = "";
                    break;
            }
        }
    }

frequency of line starting with Keywords is 1 in 100.

3

There are 3 answers

1
John Quasar On

Try using continue instead of break. Instead of stopping the loop, continue will tell the program to go one level up, thus continuing the loop for the next item.

0
rici On

There is little point scanning the entire string for a keyword just to avoid looking for the keyword at the beginning of the string. If the idea was to avoid an expensive trim, then it might be reasonable to use a cheaper technique to find the first token in the line.

Note that the startsWith comparison can produce false positives in the case that the line starts with a word whose prefix is a keyword. For example, if the keyword were break, a code line such as:

breakfast = "ham and eggs";

would be incorrectly eliminated.

You might want to investigate using StringTokenizer to extract the first word in the string, or even better, use a regular expression.

0
maaartinus On

This is something regexes are really good for. You code is equivalent to

for (int i = 0; i < linesOfCode.length; ++i) {
    linesOfCode[i] = linesOfCode.replaceAll(
        "^\\s+(Everything|Think|Result|what|#Shop,#Cure).*", "");
}

but you might require word boundary (\\b) after the keyword. For more speed, you should compile your regex like

private static final Pattern PATTERN = Pattern.compile(
    ^\\s+(Everything|Think|Result|what|#Shop,#Cure)\\b");

for (int i = 0; i < linesOfCode.length; ++i) {
    if (Pattern.matcher(linesOfCode[i]).matches()) {
        linesOfCode[i] = "";
    }
}