Why string.Contains performs better than string.IndexOf?

289 views Asked by At

I am trying to optimize a method.

Here is the original method:

private const string Separator = "::";
private char[] SeparatorChars = Separator.ToCharArray();

public string GetSubName(string name)
{
    if (name.Contains(Separator))
    {
        return name.Split(SeparatorChars)[0];
    }
    
    return "INVALID";
}

if name doesn't contain ::, return INVALID else return the first element in the array by generated by Split with ::.

I have wrote the following optimized method:

public string GetSubNameOpt(string name)
{
    var index = name.IndexOf(Separator, StringComparison.Ordinal);
    if (index >= 0)
    {
        return name.AsSpan().Slice(0, index).ToString();
    }

    return "INVALID";
}

The goal was to omit the second O(N) iteration over the string in order to split it in case it contains the :: substring.

I benchmarked both method using BenchmarkDotNet and here are the results enter image description here

Now.. while, as expected, the Optimized method is better in time and memory in the "name contains :: case" due usage of AsSpan and removing of 1 O(N) iteration of the string, it was surprise to me that the Non Optimized method was better for the INVALID case.

EDIT Running with a non empty, not containing :: case. enter image description here

Again, the Optimized method is slower...

Can you explain what is causing this behavior?

2

There are 2 answers

0
Guru Stron On BEST ANSWER

Because string.Contains(string) (i.e. without explicitly specified StringComparison) is a very special beast in recent implementations of .NET and looks something like the following:

public bool Contains(string value)
{
    if (value == null)
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.value);
 
    return SpanHelpers.IndexOf(
        ref _firstChar,
        Length,
        ref value._firstChar,
        value.Length) >= 0;
}

While the string.Contains(string, StringComparison) overload just call IndexOf:

public bool Contains(string value, StringComparison comparisonType)
{
    return IndexOf(value, comparisonType) >= 0;
}

i.e. name.Contains(Separator, StringComparison.Ordinal) will result in "expected" performance.

And IndexOf for ordinal comparison currently uses internal Ordinal.IndexOf which contains quite a lot of checks and different optimizations for more general use-case.

One thing you can try is using AsSpan().IndexOf which on my machine results in better performance for the optimized version:

public string name { get; set; } = "qwerty";
private const string Separator = "::";
private char[] SeparatorChars = Separator.ToCharArray();

[Benchmark]
public string GetSubName()
{
    if (name.Contains(Separator))
    {
        return name.Split(SeparatorChars)[0];
    }
    
    return "INVALID";
}

[Benchmark]
public string GetSubNameOpt()
{
    var index = name.AsSpan().IndexOf(SeparatorChars);
    if (index >= 0)
    {
        return name.AsSpan().Slice(0, index).ToString();
    }
    
    return "INVALID";
}
Method Mean Error StdDev
GetSubName 8.850 ns 0.2388 ns 0.3856 ns
GetSubNameOpt 8.430 ns 0.2337 ns 0.2400 ns
0
radfast On

Worth noting that by default, IndexOf(string needle) uses StringComparison.CurrentCulture which is almost certainly going to be lower performance than StringComparison.Ordinal. And probably also not what the code author wants, unless the two strings happen to be localized text strings.