C# Regex - get values from repeatable groups

89 views Asked by At

I have this Regex pattern where I try to find out if a sentence (string) matches it.

My Pattern:

@"^A\s(?<TERM1>[A-Z][a-z]{1,})\sconsists\sof\s((?<MINIMUM1>(\d+))\sto\s(?<MAXIMUM1>(\d+|many){1})|(?<MINMAX1>(\d+|many{1}){1}){1})\s(?<TERM2>[A-Z][a-z]{1,})(\sand\s((?#********RepeatablePart********)(?<MININUM2>(\d+))\sto\s(?<MAXIMUM2>(\d+|many){1})|(?<MINMAX2>(\d+|many{1}){1}){1})\s(?<TERM3>([A-Z][a-z]{1,})))+\.$"

How to read my pattern:

A (TERM1) consists of (MINIMUM1 to (MAXIMUM1|many)|(MINMAX1|many)) (TERM2) ((?#********RepeatablePart********)and (MINIMUM2 to (MAXIMUM2|many)|(MINMAX|many)) (TERM3))+.

MINMAX1/MINMAX2 could be a number or just the word 'many' and a MINIMUM1/MINIMUM2 is a number and MAXIMUM1/MAXIMUM2 could be a number or the word 'many'.

Example Sentences:

  1. A Car consists of 2 to 5 Seats and 1 Breakpedal and 1 Gaspedal and 4 to 6 Windows.
  2. A Tree consists of many Apples and 2 to many Colors and 0 to 1 Squirrel and many Leaves.
  3. A Book consists of 1 to many Authors and 1 Title and 3 Bookmarks.

    1. would contain: TERM1 = Car, MINIMUM1 = 2, MAXIMUM1 = 5, MINMAX1 = null, TERM2 = Seats, MINIMUM2 = null, MAXIMUM2 = null, MINMAX2 = 1, TERM3=Breakpedal, MINIMUM2 = null, MAXIMUM2 = null, MINMAX2=1, TERM3= Gaspedal, MINIMUM2 = 4, MAXIMUM2 = 6, MINMAX2= null, TERM3= Windows
    2. would contain: TERM1 = Tree, MINIMUM1 = null, MAXIMUM1 = null, MINMAX1 = many, TERM2 = Apples, MINIMUM2 = 2, MAXIMUM2 = many, MINMAX2 = null, TERM3=Colors, MINIMUM2 = 0, MAXIMUM2 = 1, MINMAX2 = null, TERM3=Squirrel, MINIMUM2 = null, MAXIMUM2 = null, MINMAX2 = many, TERM3=Leaves
    3. would contain: TERM1 = Book, MINIMUM1 = 1, MAXIMUM1 = many, MINMAX1 = null, TERM2 = Authors, MINIMUM2 = null, MAXIMUM2 = null, MINMAX2 = 1, TERM3=Title, MINIMUM2 = null, MAXIMUM2 = null, MINMAX2 = 3, TERM3=Bookmarks

I created a class which I would like to fill with the values of the repeatable part in my string (speaking of MINIMUM2, MAXIMUM2, MINMAX and TERM3):

//MyObject contains the values of one expression from the repateatable part.
public class MyObject
{   
    public string term { get; set; }
    public string min { get; set; }
    public string max { get; set; }
    public string minmax { get; set; }
}

Since my pattern has a repeatable part (+) I want to create a List where I add a new object (MyObject) which I would like to fill in the values of the repatable groups.

My problem is I'm not sure how to fill my object with the values of my repeatable parts. The way I tried to code it is wrong because my lists don't have the same amounts of values since a sentence (e.g. 'A Book consists of 1 to many Authors and 1 Title and 3 Bookmarks.'.) never has one MINIMUM2, one MAXIMUM2 and one MINMAX2 in each repeatable part.

Is there a simpler way to fill my Object or how I could get the values from my quantifier part?

My Code (in c#):

var match = Regex.Match(exampleText, pattern);
if (match.Success)
{

    string term1 = match.Groups["TERM1"].Value;
    string minimum1 = match.Groups["MINIMUM1"].Value;
    string maximum1 = match.Groups["MAXIMUM1"].Value;
    string minmax1 = match.Groups["MINMAX1"].Value;
    string term2 = match.Groups["TERM2"].Value;

    //--> Groups[].Captures..ToList() might be wrong. Maybe there is a better way to get the values of the reapeatable Part
    List<string> minimums2 = match.Groups["MINIMUM2"].Captures.Cast<Capture>().Select(x => x.Value).ToList<string>();
    List<string> maximums2 = match.Groups["MAXIMUM2"].Captures.Cast<Capture>().Select(x => x.Value).ToList<string>();
    List<string> minmaxs2 = match.Groups["MINMAX2"].Captures.Cast<Capture>().Select(x => x.Value).ToList<string>();
    List<string> terms3 = match.Groups["TERM3"].Captures.Cast<Capture>().Select(x => x.Value).ToList<string>();

    List<MyObject> myList = new List<MyObject>();

    for (int i = 0; i<terms3.Count; i++)
    {
       myList.Add(new MyObject()
          {
             term = terms3[i],
             min = minimums2[i] //-->ERROR MIGHT HAPPEN when List<string>minimums2 doesn't have the same amount of values like List<string> terms3
             max = maximums2[i] //-->ERROR..
             minmax = minmaxs2[i] //-->ERROR...
           });
     }
}
1

There are 1 answers

0
Joe Jonsman On BEST ANSWER

I could solve my problem on my own by splitting my exampleText after the word 'and' so I have a string 'splittedText' which contains every phrase of the repeatable part of my pattern.

string[] splittedText = Regex.Split(exampleText, @"\sand\s");

After splitting my exampleText I insert the values of each single phrase into myObject in a for-loop where I do another regex.match to get the values I need out of each phrase.

string pattern2 =(((?#********RepeatablePart********)(?<MININUM2>(\d+))\sto\s(?<MAXIMUM2>(\d+|many){1})|(?<MINMAX2>(\d+|many{1}){1}){1})\s(?<TERM3>([A-Z][a-z]{1,})))+\.$
List<MyObject> myList = new List<MyObject>();

//i = 1 -> since splittedText[0] contains the beginning of the sentence (e.g. 'A Car consists of 2 to 5 Seats')
for (int i = 1; i<splittedText.Count(); i++)
{                 
   var match2 = Regex.Match(splittedText[i], pattern2);
   if (match2.Success)
   {                      
       myList.Add(new MyObject()
       {
          term = match2.Groups["TERM3"].Value,              
          min = match2.Groups["MININUM2"].Value,
          max = match2.Groups["MAXIMUM2"].Value,
          minmax = match2.Groups["MINMAX2"].Value
        });

    }
 }