How can I count occurences of two words following each other in a string in C#?

293 views Asked by At

I did one word version using regex like this:

public Dictionary<string, int> MakeOneWordDictionary(string content)
{
    Dictionary<string, int> words = new Dictionary<string, int>();
    // Regex checking word match
    var wordPattern = new Regex(@"\w+");
    // Refactor text and clear it from punctuation marks
    content = RemoveSigns(content);
    foreach (Match match in wordPattern.Matches(content))
    {
        int currentCount = 0;
        words.TryGetValue(match.Value, out currentCount);
        currentCount++;
        words[match.Value] = currentCount;
    }
    return words;
}

And it gives an output like this

This piece of code returns words and their frequency in a dictionary. I need two words version of this now. Which will count occurences of two words following each other in a string.

Should I modify the regex? If yes how should I modify it?

1

There are 1 answers

3
Paul Kertscher On BEST ANSWER

I think this can be written in a more self-explaining way without RegExp.

string input = "a a b test a a";
string[] words = input.Split(' ');

var combinations = from index in Enumerable.Range(0, words.Length-1)
                   select new Tuple<string,string>(words[index], words[index+1]);

var groupedTuples = combinations.GroupBy(t => t);
var countedCombinations = groupedTuples.Select(g => new { Value = g.First(), Count = g.Count()});

The first two lines define the input and split it by spaces, i.e. separate it into single words. The third line goes through the array of words from the first to the (N-1)th element (where Nis the number of words) and builds a tuple of the n-th and the (n+1)-th element. In the fourth line these tuples are grouped by themselves (two tuples with the same elements are considered equal). In the last step/line, the the elements of each group are counted and the counts are stored in an anonymously typed variable, along with their respective values.

This logic can also be applied to your RegExp version.

Edit: To get a dictionary, like in your example, you can use the ToDictionary extension method

var countedCombinations = groupedTuples.ToDictionary(g => g.First(), g => g.Count());

The first parameter is a selector method for the key, the second one for the value.