Using StringComparer with StringBuilder to search for a string

141 views Asked by At

I need to use globalization rules to search for all occurrences of a string within a document. The pseudocode is:

var searchText = "Hello, World";
var compareInfo = new CultureInfo("en-US").CompareInfo;

DocumentIterator start = null; // the start position if a match occurs
var sb = new StringBuilder();

// the document is not a string, but exposes an iterator to its content
for (var iter = doc.Start(); iter.IsValid(); ++iter)
{
    start = start ?? iter; // the start of the potential match

    var ch = iter.GetChar(); 
    sb.Append(ch);

    if (compareInfo.Compare(searchText, sb.ToString()) == 0) // exact match
    {
        Console.WriteLine($"match at {start}-{iter}");
        // not shown: continue to search for more occurrences.
    }
    else if (!compareInfo.IsPrefix(criteria.Text, sb.ToString()))
    {
        // restart the search from the character immediately following start
        sb.Clear();
        iter = start; // this gets incremented immediately
        start = null;
    }
}

This delegates to CompareInfo the difficult job of culture-sensitive string matching.

However, the stream-like process implemented by the code has performance issues because it calls StringBuilder.ToString() in every iteration, thus defeating the performance benefit of StringBuilder.

Question: How can I do this search efficiently?

2

There are 2 answers

1
Luke Cummings On BEST ANSWER

So why not copy the whole document to a stringbuilder first, the use 1 ToString(). Then just use a similar scheme to iterate over all the possible values. Use compareInfo.Compare(criteria.Text, 0, criteria.Text.Length, docString, startIndex, checkLength)

3
Peter Smith On

Why not use String IndexOf which is culture sensitive and then iterate through your document using indexOf to start the next searc until nothing is found See the first answer here.

All you need to do to start it off is set the current culture. I'm assuming the do loop is then obvious.