remove stopword from a String in asp.net c#

1.1k views Asked by At

I am having trouble creating code which removes stop words from a string. Here is my code:

String Review="The portfolio is fine except for the fact that the last movement of sonata #6 is missing. What should one expect?";

string[] arrStopword = new string[] {"a", "i", "it", "am", "at", "on", "in", "to", "too", "very","of", "from", "here", "even", "the", "but", "and", "is","my","them", "then", "this", "that", "than", "though", "so", "are"};
StringBuilder sbReview = new StringBuilder(Review);
foreach (string word in arrStopword){
sbReview.Replace(word, "");}
Label1.Text = sbReview.ToString();

when running Label1.Text = "The portfolo s fne except for fct tht lst movement st #6 s mssng. Wht should e expect? "

I expect it must return "portofolio fine except for fact last movement sonata #6 is missing. what should one expect?"

Anybody know how to solve this?

4

There are 4 answers

0
Quark On BEST ANSWER

You could use " a ", " I ", etc to make sure the program only removes those words if they're used as a word (so with spaces around them). Just replace them with a space to keep the formatting as it is.

0
David Pilkington On

The problem is that you are comparing sub strings, not words. You need to split the original text, remove the items and then join it again.

try this

List<string> words = Review.Split(" ").ToList();
foreach(string stopWord in arrStopWord)
    words.Remove(stopWord);
string result = String.Join(" ", words);

The only issue that I can see with this is that it doesnt handle punctiation that well, but you get the general idea.

2
Nitin Varpe On

You can use LINQ to solve this problem. You first need to convert your string, using Split function, into list of string separated by " "(space), then use Except to get the words which your result will contain and then can apply string.Join

var newString = string.Join(" ", Review.Split(' ').Except(arrStopword));
0
othman.Da On

Or You can use dotnet-stop-words package. And simply call the RemoveStopWords method

(yourString).RemoveStopWords("en");