Is there any way of finding the longest common subsequence of two sequences in O(NlogN) time?
I read somewhere that there is a way to achieve this using binary search.
I know the dp approach that takes O(N2) time.
Is there any way of finding the longest common subsequence of two sequences in O(NlogN) time?
I read somewhere that there is a way to achieve this using binary search.
I know the dp approach that takes O(N2) time.
The longest common subsequence between two sequences is essentially n-squared.
Masek and Patterson (1980) made a minor improvement to n-squared / log n using the so-called "Four Russians" technique.
In most cases the additional complexity introduced by such convoluted approaches is not justified by the small gains. For practical purposes you can consider the n-squared approach as the reasonable optimum in typical applications.
For the general case, the O(N^2) dynamic programming algorithm is the best you can do. However, there exist better algorithms in some special cases.
This is a very common situation. Sequences consisting of letters from some alphabet (e.g. English) lie in this category. For this case, the O(N*M) algorithm can be optimised to get an O(N^2/logN) with method of four Russians. I don't know how exactly, you can search for it.
An example problem is "Given two permutations of numbers from 1 to N, find their LCS". This one can be solved in O(N*logN).
Let the sequences be A and B.
Define a sequence C. C[i] is the index of B[i] in A. (A[C[i]] = B[i])
LCS of A and B is the longest increasing subsequence of C.
Assuming Exponential Time Hypothesis (which is stricter than P is not equal to NP, but is still widely believed to be true), it is not possible to do it in time O(N^{2 - eps}) for any positive constant eps, see "Quadratic Conditional Lower Bounds for String Problems and Dynamic Time Warping" by Karl Bringmann and Marvin Kunnemann (pre-print on arXiv is available).
Roughly speaking, it means that the general case of this problem cannot be solved in time better than something like O(N^2/log N), so if you want faster algorithms you have to consider additional constraints (some properties of the strings) or look for approximate solution.
The following is a O(NLOGM) approach of the LCS problem that uses binary search:
//If the strings are not the same length, take the longest one
//Map each letter with a ArrayList representing the occurences on that letter (indexes)
//The ArrayLists are thus sorted. This will enable binary search later on
Map<Character, List<Integer>> map = new HashMap<>();
for (int i = 0; i < first.length(); i++) {
char c = first.charAt(i);
if (!map.containsKey(c)) {
map.put(c, new ArrayList<>());
}
map.get(c).add(i);
}
//Loop on the second string and increment an index to check that any matching character is a valid match
//For sequence match we need to maintain the same order as in the initial string so we use an increasing index for that
int index = 0;
StringBuilder lcs = new StringBuilder();
for (int j = 0; j < second.length(); j++) {
char c = second.charAt(j);
if (map.containsKey(c)) {
List<Integer> indexes = map.get(c);
//The following method uses Binary Search to find the first bigger or equal index
//This is possible since the indexes list is sorted by definition
int firstBigger = firstBiggerOrEqualIndex(indexes, index);
if (firstBigger != -1) {
index = firstBigger+1;
lcs.append(c);
}
}
}
return lcs.toString();
The method firstBiggerOrEqualIndex uses Binary Search like this:
private static int firstBiggerOrEqualIndex(List<Integer> indexes, int index) {
int start = 0;
int end = indexes.size()-1;
int res = -1;
while(start <= end) {
int mid = (start+end)/2;
int value = indexes.get(mid);
if (value > index) {
res = value;
end = mid-1;
}
else if (value == index) {
res = value;
break;
}
else {
start = mid + 1;
}
}
return res;
}
The dynamic programming approach, which is O(n2) for general case. For certain other cases, there are lower-complexity algorithms:
For a fixed alphabet size (which doesn't grow with n), there's the Method of Four Russians which brings the time down to O(n2/log n) (see here).
See here another further optimized case.