I have a pair of strings, for example: abcabcabc
and abcxxxabc
and a List of Common Substring Pairs (LCSP), in this case LCSP is 6 pairs, because three abc
in the first string map to two abc
in the second string. Now I need to find the longest valid (incrementing) sequence of pairs, in this case there are three equally long solutions: 0:0,3:6; 0:0,6:6; 3:0,6:6
(those numbers are starting positions of each pair in the original strings, the length of substrings is 3 as length of "abc"). I would call it the Longest Substring Pair Sequence or LSPQ. (Q is not to confuse String and Sequence)
Here is the LCSP for this example:
LCSP('abcabcabc', 'abcxxxabc') =
[ [ 6, 6, 3 ],
[ 6, 0, 3 ],
[ 3, 6, 3 ],
[ 0, 6, 3 ],
[ 3, 0, 3 ],
[ 0, 0, 3 ] ]
LSPQ(LCSP('abcabcabc', 'abcxxxabc'), 0, 0, 0) =
[ { a: 0, b: 0, size: 3 }, { a: 3, b: 6, size: 3 } ]
Now I find it with brute force recursively trying all combinations. So I am limited to about 25 pairs, otherwise it is unpractical. Size=[10,15,20,25,26,30], Time ms = [0,15,300,1000,2000,19000]
Is there a way to do that in linear time or at least not quadratic complexity so that longer input LCSP (List of Common Substring Pairs) could be used.
This problem is similar to the "Longest Common Subsequence", but not exactly it, because the input is not two strings but a list of common substrings sorted by their length. So I do not know where to look for an existing solutions or even if they exist.
Here is my particular code (JavaScript):
function getChainSize(T) {
var R = 0
for (var i = 0; i < T.length; i++) R += T[i].size
return R
}
function LSPQ(T, X, Y, id) {
// X,Y are first unused character is str1,str2
//id is current pair
function findNextPossible() {
var x = id
while (x < T.length) {
if (T[x][0] >= X && T[x][1] >= Y) return x
x++
}
return -1
}
var id = findNextPossible()
if (id < 0) return []
var C = [{a:T[id][0], b:T[id][1], size:T[id][2] }]
// with current
var o = T[id]
var A = C.concat(LSPQ(T, o[0]+o[2], o[1]+o[2], id+1))
// without current
var B = LSPQ(T, X, Y, id+1)
if (getChainSize(A) < getChainSize(B)) return B
return A
}