C++ Regex to match words without punctuation

12k views Asked by At

I searched, couldn't find anything. In the interest of not wasting any more of my time on the chance that the answer is obvious to someone else, I'm asking here. Only site that has been useful so far is this one: http://softwareramblings.com/2008/07/regular-expressions-in-c.html but the samples are far too simplistic. I'm using Visual studio 2010.

#include <regex>

[...]

string seq = "Some words. And... some punctuation.";
regex rgx("\w");

smatch result;
regex_search(seq, result, rgx);

for(size_t i=0; i<result.size(); ++i){
    cout << result[i] << endl;
}

Expected output would be:

Some
words
And
some
punctuation

Thanks.

2

There are 2 answers

6
John Dibling On BEST ANSWER

A few things here.

First, your regex string needs to have the \ escaped. Its still a C++ string, after all.

regex rgx("\\w");

Also, the regex \wmatches just one "word character". If you want to match an entire word, you need to use:

regex rgx("\\w+");

Finally, in order to iterate through all possible matches, then you need to use an iterator. Here's a complete working example:

#include <regex>
#include <string>
#include <iostream>
using namespace std;

int main()
{
    string seq = "Some words. And... some punctuation.";
    regex rgx("\\w+");

    for( sregex_iterator it(seq.begin(), seq.end(), rgx), it_end; it != it_end; ++it )
        cout << (*it)[0] << "\n";
}
0
Eugene On

Try this:

string seq = "Some words. And... some punctuation.";
regex rgx("(\\w+)");

regex_iterator<string::iterator> it(seq.begin(), seq.end(), rgx);
regex_iterator<string::iterator> end;

for (; it != end; ++it)
{
    cout << it->str() << endl;
}