Efficiently find all elements of a List<string> that start with another string

1.9k views Asked by At

I have a list of strings. I want to find all of the strings that start or end with another string. At its simplest an example is;

List<string> allWords = new List<string>();

for(int index = 0; index < 1000000; index++)
    allWords.Add(index.ToString());

List<string> result = allWords.FindAll(x => x.StartsWith("10") || x.EndsWith("10"));

This algorithm scans the list from beginning to end. I need to perform this operation very quickly and O(n) is too slow.

What data structures (if any) are available to me to solve this algorithm faster that O(n)?

2

There are 2 answers

0
Thomas Levesque On BEST ANSWER

If you have an unsorted List<string>, there is no way to do it in less than O(n). However, you could use a different data structure. A trie (also called prefix tree) is particularly well suited for your need, as it has a O(m) search complexity (where m is the length of the searched prefix)

I have a C# implementation here : Trie.cs (actually, it's a trie-based dictionary, which associates a value with each key, but for your use-case you can just ignore the value; or if you prefer you can adapt the implementation to your needs).

0
dbc On
  1. To find strings starting with a given substring, sort the list, do a binary search to find the closest match, then scan adjacent strings to find others that also match the beginning. That's log(n)

  2. To find strings ending with a given substring, create a list of reversed strings, and sort that list. Then to find a string that ends in a given pattern, reverse the pattern and look for reversed strings that start with the reversed pattern, as in step 1.