How can I match a text for multiple patterns without scanning it multiple times?

698 views Asked by At

I want to match multiple patterns in given input Strings, so the outcome would be a list containing all the substrings that match any of my predefined patterns:

String input = "Episode_NN 3_CD was_XX awesome_XX";

final Pattern ruleOne = Pattern.compile("(\\w*_NN\\s|\\w*_NNS\\s)+\\w*_CD");
final Pattern ruleTwo = Pattern.compile(ruleOne.pattern().concat(""));

Matcher matcher = ruleOne.matcher(input);

List<String> ent = new ArrayList<String>();

while (matcher.find()) {
    ent.add(matcher.group());
}

So do I have to add multiple Matchers? That would mean scanning the text multiple times, like so:

while (matcherOne.find() | matcherTwo.find() | ...) {
   ....
}
2

There are 2 answers

3
C_B On

Yes, it's that simple. Except you're better off using the CONDITIONAL OR statement. This evaluates from left to right and if the first condition is true, it never evaluates the rest. It is made up of two bars like so: ||

while (matcherOne.find() || matcherTwo.find() || ...) {
   ....
}
1
Wiktor Stribiżew On

Here is a way how you can use several alternatives in 1 pattern (use alternation operator |):

\w*_NNS?\b|\w*_CD\b

Sample code

String input = "Episode_NN 3_CD was_XX awesome_XX";
final Pattern ruleOne = Pattern.compile("\\w*_NNS?\\b|\\w*_CD\\b");
Matcher matcher = ruleOne.matcher(input);

List<String> ent = new ArrayList<String>();

while (matcher.find()) {
  ent.add(matcher.group());
}
String[] arr = new String[ent.size()];
arr = ent.toArray(arr);
System.out.println(Arrays.toString(arr));

See IDEONE demo

Output: [Episode_NN, 3_CD]