How to efficiently find the index of value in a System.Numerics.Vector<T>?

363 views Asked by At

I am exploring System.Numerics.Vector with .NET Framework 4.7.2 (the project I am working on cannot be migrated to .NET Core 3 and use the new Intrinsics namespace yet). The project is processing very large CSV/TSV files and we spend a lot of times looping through strings to find commas, quotes, etc. and I am trying to speed up the process.

So far, I have been able to use Vector to identify if a string contains a given character or not (using EqualsAny method). That’s great, but I want to go a little further. I want to efficiently find the index of that character using Vector. I do not know how. Below is he function I use to determine if a string contains a comma or not.

private static readonly char Comma = ',';
public static bool HasCommas(this string s)
{
    if (s == null)
    {
        return false;
    }

    ReadOnlySpan<char> charSpan = s.AsSpan();
    ReadOnlySpan<Vector<ushort>> charAsVectors = MemoryMarshal.Cast<char, Vector<ushort>>(charSpan);
    foreach (Vector<ushort> v in charAsVectors)
    {
        bool foundCommas = Vector.EqualsAny(v, StringExtensions.Commas);
        if (foundCommas)
        {
            return true;
        }
    }

    int numberOfCharactersProcessedSoFar = charAsVectors.Length * Vector<ushort>.Count;
    if (s.Length > numberOfCharactersProcessedSoFar)
    {
        for (int i = numberOfCharactersProcessedSoFar; i < s.Length; i++)
        {
            if (s[i] == ',')
            {
                return true;
            }
        }
    }

    return false;
}

I understand that I could use the function above and scan the resulting Vector, but it would defeat the purpose of using a Vector. I heard about the new Intrinsics library that could help, but I cannot upgrade my project to .NET Core 3.

Given a Vector, how would you efficiently find the position of a character? Is there a clever trick that I am not aware of?

0

There are 0 answers