Non capturing regex in combination with strings

784 views Asked by At

I want to write a program that searches within a string for one or more words. If these words are found, I want to replace them with regex_replace by something else, for this purpose let's say simply with white space " ". What I don't want, however, is replacing the stuff that might stand between them. I've written the following lines (with Viusal Studio 2015 C++):

#include <iostream>
#include <regex>

using namespace std;

int main()
{
    string test{ "Hier drin wird gesucht und auch ersetzt." };
    string a{ "drin" };
    string b{ "auch" };
    regex r( R"(\b)" + a + R"(\b.*\b)" + b + R"(\b)");
    string result = regex_replace(test, r, " ");
    cout << result << endl;
    system("pause");

    return 0;
}

I've declared variables for the words I'm looking for because for the purpose of this program they come from a file. I know that there is the concept of non capturng groups, but if I replace the line

    regex r( R"(\b)" + a + R"(\b.*\b)" + b + R"(\b)");

by

    regex r( R"(\b)" + a + R"(\b(?:.*)\b)" + b + R"(\b)");

the output is still the same, namely

Hier ersetzt.

So, everything between the two words including the two words (same thing for more words) is replaced in spite of having put the non capturing group. I think I'm messing up something with these groups. I've already tried to make three groups of the above expression, but the result was always wrong.

What is going wrong here?

1

There are 1 answers

1
Wiktor Stribiżew On BEST ANSWER

Note that a.*b is the same as a(?:.*)b. You need a capturing group (a(.*)b) instead and replace with backreference. Also, lazy *? might be a better option if you plan to match multiple occurrences of the pattern on a line (if not line, but whole string is processed, replace . with [\s\S]):

regex r( R"(\b)" + a + R"(\b(.*?)\b)" + b + R"(\b)"); // See (.*?), capturing group
string result = regex_replace(test, r, "$1");  // See $1, backreference to Group 1 contents

See the C++ demo