Finding longest common subsequence in O(NlogN) time

16k views Asked by At

Is there any way of finding the longest common subsequence of two sequences in O(NlogN) time?

I read somewhere that there is a way to achieve this using binary search.

I know the dp approach that takes O(N2) time.

6

There are 6 answers

1
Ami Tavory On

The dynamic programming approach, which is O(n2) for general case. For certain other cases, there are lower-complexity algorithms:

  • For a fixed alphabet size (which doesn't grow with n), there's the Method of Four Russians which brings the time down to O(n2/log n) (see here).

  • See here another further optimized case.

1
Tyler Durden On

The longest common subsequence between two sequences is essentially n-squared.

Masek and Patterson (1980) made a minor improvement to n-squared / log n using the so-called "Four Russians" technique.

In most cases the additional complexity introduced by such convoluted approaches is not justified by the small gains. For practical purposes you can consider the n-squared approach as the reasonable optimum in typical applications.

1
sparik On

For the general case, the O(N^2) dynamic programming algorithm is the best you can do. However, there exist better algorithms in some special cases.

  1. Alphabet size is bounded

This is a very common situation. Sequences consisting of letters from some alphabet (e.g. English) lie in this category. For this case, the O(N*M) algorithm can be optimised to get an O(N^2/logN) with method of four Russians. I don't know how exactly, you can search for it.

  1. Both sequences consist of distinct elements

An example problem is "Given two permutations of numbers from 1 to N, find their LCS". This one can be solved in O(N*logN).
Let the sequences be A and B.
Define a sequence C. C[i] is the index of B[i] in A. (A[C[i]] = B[i])
LCS of A and B is the longest increasing subsequence of C.

0
Andrew Ryzhikov On

Assuming Exponential Time Hypothesis (which is stricter than P is not equal to NP, but is still widely believed to be true), it is not possible to do it in time O(N^{2 - eps}) for any positive constant eps, see "Quadratic Conditional Lower Bounds for String Problems and Dynamic Time Warping" by Karl Bringmann and Marvin Kunnemann (pre-print on arXiv is available).

Roughly speaking, it means that the general case of this problem cannot be solved in time better than something like O(N^2/log N), so if you want faster algorithms you have to consider additional constraints (some properties of the strings) or look for approximate solution.

1
user17656768 On
vector <int> LIS;
int LongestIncreasingSubsequence(int n){
    if(!n) return 0;
    LIS.emplace_back(arr[0]);
    for(int i = 1; i < n; i++){
        if(arr[i] > LIS.back()) LIS.emplace_back(arr[i]);
        else *lower_bound(LIS.begin(), LIS.end(), arr[i]) = arr[i];
    }
    return LIS.size();
}
0
Rabih Sarieddine On

The following is a O(NLOGM) approach of the LCS problem that uses binary search:

//If the strings are not the same length, take the longest one
//Map each letter with a ArrayList representing the occurences on that letter (indexes)
//The ArrayLists are thus sorted. This will enable binary search later on
Map<Character, List<Integer>> map = new HashMap<>();
for (int i = 0; i < first.length(); i++) {
    char c = first.charAt(i);
    if (!map.containsKey(c)) {
        map.put(c, new ArrayList<>());
    }
    map.get(c).add(i);
}

//Loop on the second string and increment an index to check that any matching character is a valid match
//For sequence match we need to maintain the same order as in the initial string so we use an increasing index for that

int index = 0;
StringBuilder lcs = new StringBuilder();

for (int j = 0; j < second.length(); j++) {
    char c = second.charAt(j);
    if (map.containsKey(c)) {
        List<Integer> indexes = map.get(c);
        //The following method uses Binary Search to find the first bigger or equal index
        //This is possible since the indexes list is sorted by definition
        int firstBigger = firstBiggerOrEqualIndex(indexes, index);
        if (firstBigger != -1) {
            index = firstBigger+1;
            lcs.append(c);
        }
    }
}
return lcs.toString();

The method firstBiggerOrEqualIndex uses Binary Search like this:

private static int firstBiggerOrEqualIndex(List<Integer> indexes, int index) {

    int start = 0;
    int end = indexes.size()-1;
    int res = -1;
    while(start <= end) {
        int mid = (start+end)/2;
        int value = indexes.get(mid);
        if (value > index) {
            res = value;
            end = mid-1;
        }
        else if (value == index) {
            res = value;
            break;
        }
        else {
            start = mid + 1;
        }
    }
    return res;
}