I did one word version using regex like this:
public Dictionary<string, int> MakeOneWordDictionary(string content)
{
Dictionary<string, int> words = new Dictionary<string, int>();
// Regex checking word match
var wordPattern = new Regex(@"\w+");
// Refactor text and clear it from punctuation marks
content = RemoveSigns(content);
foreach (Match match in wordPattern.Matches(content))
{
int currentCount = 0;
words.TryGetValue(match.Value, out currentCount);
currentCount++;
words[match.Value] = currentCount;
}
return words;
}
This piece of code returns words and their frequency in a dictionary. I need two words version of this now. Which will count occurences of two words following each other in a string.
Should I modify the regex? If yes how should I modify it?
I think this can be written in a more self-explaining way without RegExp.
The first two lines define the input and split it by spaces, i.e. separate it into single words. The third line goes through the array of words from the first to the
(N-1)th
element (whereN
is the number of words) and builds a tuple of then-th
and the(n+1)-th
element. In the fourth line these tuples are grouped by themselves (two tuples with the same elements are considered equal). In the last step/line, the the elements of each group are counted and the counts are stored in an anonymously typed variable, along with their respective values.This logic can also be applied to your RegExp version.
Edit: To get a dictionary, like in your example, you can use the
ToDictionary
extension methodThe first parameter is a selector method for the key, the second one for the value.