Java ReplaceAll vs For loop with Replace in it

7k views Asked by At

I have a string which holds some data and I need to remove some special characters from it and tokenize data.

Which of the following two methods should be preferred for better performance:

String data = "Random data (For performance) Waiting for reply?"
data=data.replaceAll("?", "");
data=data.replaceAll(".", "");
data=data.replaceAll(",", "");
data=data.replaceAll("(", "");
data=data.replaceAll(")", "");  

String[] tokens = data.split("\\s+");  
for(int j = 0; j < tokens.length; j++){
  //Logic on tokens
}  

OR

String data = "Random data (For performance) Waiting for reply?"

String[] tokens = data.split("\\s+");  
for(int j = 0; j < tokens.length; j++){
    tokens[j]=tokens[j].replace("?", "");
    tokens[j]=tokens[j].replace(".", "");
    tokens[j]=tokens[j].replace(",", "");
    tokens[j]=tokens[j].replace("(", "");
    tokens[j]=tokens[j].replace(")", "");      

  //Logic on each token
}  

Or Is there any other approach which can increase performance? (Some statistics on same would be greatly appreciated)

The For loop provided above will be used for performing other logic on each token.
Is the replace method imposed on a whole content faster or is replace on each token in a for loop (which is executed regardless of the replacing) faster?

i.e. Replace once and perform other operations or Replace step by step for each token and then perform the required operation.

Thanks in Advance

2

There are 2 answers

4
Vadzim On BEST ANSWER

Just replace would be enough without any loops.

replaceAll uses regexp engine under the hood that has much more performance overhead.

There seems to be a common misunderstanding of this "All" suffix.

See Difference between String replace() and replaceAll().

Update

Found very similar question to this one:

Removing certain characters from a string

2
rlinden On

I am not aware of statistics for this kind of problem, but first of all, if you are concerned about performance, I would substitute the various replaceAll() calls with a single one, like this:

data=data.replaceAll("\\?|\.|\\)|\\(|,", "");

It might go faster.