Sentence extraction from paragraph

529 views Asked by At

Using strtok one can get each tocken in the para individually.

I want to capture all sentences in the page individually for process them separately.

One solution is I keep for loop and check each character, if it is . then I consider sentence is completed so store in some data structure. I dont know which data structure is best suitable to store this. Array or vector?

Is there any other better way or some c++ class available to do this?

UPDATE

Later I want to perform action on negations in the sentence. Means considering not, no, nope such key words. if not + negative word then taking it as +ve word.

1

There are 1 answers

5
dalle On

As you are using C++, the best data structure to store strings is the std::string class. Store multiple strings in a std::vector<std::string>. By the way don't use strtok, use std::getline instead.

But as you are doing text manipulation, and perhaps international text manipulation, you should take a look at the ICU library. In this case icu::BreakIterator::createSentenceInstance in particular.