stop words removal using c#

1.7k views Asked by At

I have two string arrays i.e.

string[] text = {"Paragraph 1 containing long text of ten to 20 lines", "Paragraph 2 containing long text of ten to 20 lines", "Paragraph 3 containing long text of ten to 20 lines",.....};

and another array of stop words i.e.

string[] stop_words = File.ReadAllLines(@"C:\stopWords.txt");

string[] text array is containing paragraphs of text and string[] stop_words array consists stop words to be removed from all the texts stored in string[] text array

How the stop words can be removed using c#. Code suggestions will be highly appreciated.

Thanks

2

There are 2 answers

4
Rahul Tripathi On

Try like this:

string[] result = text.Except(stop_words).ToArray();

or else you can try using for loop

string[] stop_word = new string[] { "please", "try", "something" };

string str = "Please try something by yourself before asking";
foreach (string word in stop_word )
{
   str = str.Replace(word, "");
}
0
tariq On

Let me explain the flow:

1) I have to iterate over the input_Texts string array..fine.

2) Inside the loop I split the paragraph on the basis of space i.e (' ') so that I get all the words in it.

3) Then I find all the Intersecting/matching words between them and the stopWords.

4) And then take all the words except the matching words.

5) Join them with space again to create back the text from words(devoid of stopWords) and put it into the same place back again.

  for(int i=0;i<input_Texts.Length;i++)
  {
    input_Texts[i]=string.Join(" ", input_Texts[i].Split(' ').Except(input_Texts[i].Split(' ').Intersect(stopWords)));
  }

Can you follow this?