stringstream with multiple delimiters

5.7k views Asked by At

This is another question that I can't seem to find an answer to because every example I can find uses vectors and my teacher won't let us use vectors for this class.

I need to read in a plain text version of a book one word at a time using (any number of) blank spaces
' ' and (any number of) non-letter character's as delimiters; so any spaces or punctuation in any amount needs to separate words. Here's how I did it when it was only necessary to use blank spaces as a delimiter:

while(getline(inFile, line)) {
    istringstream iss(line);

    while (iss >> word) {
        table1.addItem(word);
    }
}

EDIT: An example of text read in, and how I need to separate it.

"If they had known;; you wished it, the entertainment.would have"

Here's how the first line would need to be separated:

If

they

had

known

you

wished

it

the

entertainment

would

have

The text will contain at the very least all standard punctuation, but also such things as ellipses ... double dashes -- etc.

As always, thanks in advance.

EDIT:

So using a second stringstream would look something like this?

while(getline(inFile, line)) {
    istringstream iss(line);

    while (iss >> word) {
        istringstream iss2(word);

        while(iss2 >> letter)  {
            if(!isalpha(letter))
                // do something?
        }
        // do something else?
        table1.addItem(word);
    }
}
2

There are 2 answers

8
vsoftco On BEST ANSWER

I haven't tested this, as I do not have a g++ compiler in front of me now, but it should work (aside from minor C++ syntactic errors)

while (getline(inFile, line))
{
    istringstream iss(line);

    while (iss >> word)
    {
        // check that word has only alpha-numeric characters
        word.erase(std::remove_if(word.begin(), word.end(), 
                                  [](char& c){return !isalnum(c);}),
                   word.end());
        if (word != "")
            table1.addItem(word);
    }
}
1
Jiří Pospíšil On

If you are free to use Boost, you can do the following:

$ cat kk.txt
If they had known;; you ... wished it, the entertainment.would have

You can customize the behavior of tokenizer if needed but the default should be sufficient.

#include <iostream>
#include <fstream>
#include <string>

#include <boost/tokenizer.hpp>

int main()
{
  std::ifstream is("./kk.txt");
  std::string line;

  while (std::getline(is, line)) {
    boost::tokenizer<> tokens(line);

    for (const auto& word : tokens)
      std::cout << word << '\n';
  }

  return 0;
}

And finally

$ ./a.out
If
they
had
known
you
wished
it
the
entertainment
would
have