Make a set prefix-free

341 views Asked by At

Is there a standard or best algorithm to make a given set of strings prefix-free? That is, given a set of strings, throw out all strings that have a (shorter) prefix also in that set.

In case it matters, I'm ultimately gonna implement this in Python 2.7.

2

There are 2 answers

4
aghast On BEST ANSWER
strings = ['a', 'apple', 'b', 'beta', 'c', 'd']

def prefices_only(strlist):
    ordered = sorted(strlist)
    last = ordered[0]
    results = [last]

    for c in ordered:
        if not c.startswith(last):
            last = c
            results.append(c)

    return results

print(prefices_only(strings))
10
j_random_hacker On

[EDIT: Discard strings that have (not that are) prefixes]

  1. Sort the strings in increasing length order.
  2. Insert each string into a trie. If the insertion of a character would create a new child node for a currently childless (i.e., leaf) node, discard the current string -- it has a prefix.

[EDIT: Fixed time complexity]

The first step takes O(n log n) time to sort the n strings. If the average string length exceeds log(n), then this time complexity is dominated by the second step, which takes time (and space) linear in the total size of all the input strings. It's pretty easy to implement too.