I have a string which holds some data and I need to remove some special characters from it and tokenize data.
Which of the following two methods should be preferred for better performance:
String data = "Random data (For performance) Waiting for reply?"
data=data.replaceAll("?", "");
data=data.replaceAll(".", "");
data=data.replaceAll(",", "");
data=data.replaceAll("(", "");
data=data.replaceAll(")", "");
String[] tokens = data.split("\\s+");
for(int j = 0; j < tokens.length; j++){
//Logic on tokens
}
OR
String data = "Random data (For performance) Waiting for reply?"
String[] tokens = data.split("\\s+");
for(int j = 0; j < tokens.length; j++){
tokens[j]=tokens[j].replace("?", "");
tokens[j]=tokens[j].replace(".", "");
tokens[j]=tokens[j].replace(",", "");
tokens[j]=tokens[j].replace("(", "");
tokens[j]=tokens[j].replace(")", "");
//Logic on each token
}
Or Is there any other approach which can increase performance? (Some statistics on same would be greatly appreciated)
The For
loop provided above will be used for performing other logic on each token.
Is the replace method imposed on a whole content faster or is replace on each token in a for loop (which is executed regardless of the replacing) faster?
i.e. Replace once and perform other operations or Replace step by step for each token and then perform the required operation.
Thanks in Advance
Just
replace
would be enough without any loops.replaceAll
uses regexp engine under the hood that has much more performance overhead.There seems to be a common misunderstanding of this "All" suffix.
See Difference between String replace() and replaceAll().
Update
Found very similar question to this one:
Removing certain characters from a string