Regex get index of first character not inside double quotes

391 views Asked by At

I am looking for a regex to do the following:

  • Return the index of the first instance of a given character not inside double quotes (I can guarantee a matching closing double quote will always be present and the character to search for will never itself be a double quote)
  • Allow starting from int startIndex position

Speed is one of my primary concerns here so the number of iterations should be as small as possible.

Examples (all examples set to look for !, but this might not always be the case):

  • !something - should return 0
  • ! should also return 0
  • something! should return 9
  • "something!" should fail
  • "some!thing"! should return 12
  • !"!"! should return 0
  • ""! should return 2
  • ""!"" should return 2
  • !something with startIndex == 2 should fail
  • !something! with startIndex == 2 should return 10 (despite starting at position 2, the index of the character on the given string is still 10)

Since this is for .NET the intention is to use Regex.Match().Index (unless a better alternative is provided).

2

There are 2 answers

0
Anu Viswan On

If you really need regex version, you could use a pattern as follows.

"(?<searchTerm>!)(?=(?:[^\"]|\"[^\"]*\")*$)"

Example, for input

var input = new []
    {
    new {Key= "!something", BeginIndex=0},
    new {Key= "!", BeginIndex=0},
    new {Key= "something!", BeginIndex=0},
    new {Key= "\"something!\"", BeginIndex=0},
    new {Key= "\"some!thing\"!", BeginIndex=0},
    new {Key= "!\"!\"!", BeginIndex=0},
    new {Key= "\"\"!", BeginIndex=0},
    new {Key= "\"\"!\"\"", BeginIndex=0},
    new {Key= "!something", BeginIndex=2},
    new {Key= "!something!", BeginIndex=2},
    new {Key="!\"some!thing\"!",BeginIndex=5}
    };

You can search Index as follows

var pattern = "(?<searchTerm>!)(?=(?:[^\"]|\"[^\"]*\")*$)";
Regex regex = new Regex(pattern,RegexOptions.Compiled);
foreach(var str in input)
{
    var index = str.Key.GetIndex(regex,str.BeginIndex);
    Console.WriteLine($"String:{str.Key} , Index : {index}");
}

Where GetIndex is defined as

public static class Extension
{
    public static int GetIndex(this string source,Regex regex,int beginIndex=0)
    {
        var match = regex.Match(source);
        while(match.Success)
        {   

            if(match.Groups["searchTerm"].Index >= beginIndex)
                return match.Groups["searchTerm"].Index;

            match = match.NextMatch();
        }
        return -1;
    }
}

Output

String:!something , Index : 0
String:! , Index : 0
String:something! , Index : 9
String:"something!" , Index : -1
String:"some!thing"! , Index : 12
String:!"!"! , Index : 0
String:""! , Index : 2
String:""!"" , Index : 2
String:!something , Index : -1
String:!something! , Index : 10
String:!"some!thing"! , Index : 13

Hope that helps.

1
Dmitry Bychenko On

I suggest good old for loop instead of regular expressions; let's implement it as an extension method:

  public static partial class StringExtensions {
    public static int IndexOfQuoted(this string value,
                                    char toFind,
                                    int startPosition = 0,
                                    char quotation = '"') {
      if (string.IsNullOrEmpty(value))
        return -1;

      bool inQuotation = false;

      for (int i = 0; i < value.Length; ++i)
        if (inQuotation)
          inQuotation = value[i] != quotation;
        else if (value[i] == toFind && i >= startPosition)
          return i;
        else
          inQuotation = value[i] == quotation;

      return -1;
    }
  }

And so, you can use it as if IndexOfQuoted a strings method:

  string source = "something!";
  int result = source.IndexOfQuoted('!'); 

Demo:

  string[] tests = new string[] {
    "!something",
    "!",
    "something!",
    "\"something!\"",
    "\"some!thing\"!",
    "!\"!\"!",
    "\"\"!",
    "\"\"!\"\"",
  };

  string report = string.Join(Environment.NewLine, tests
    .Select(test => $"{test,-20} -> {test.IndexOfQuoted('!')}"));

  Console.Write(report);

Outcome:

!something           -> 0
!                    -> 0
something!           -> 9
"something!"         -> -1
"some!thing"!        -> 12
!"!"!                -> 0
""!                  -> 2
""!""                -> 2