Longest Substring Pair Sequence is it Longest Common Subsequence or what?

247 views Asked by At

I have a pair of strings, for example: abcabcabc and abcxxxabc and a List of Common Substring Pairs (LCSP), in this case LCSP is 6 pairs, because three abc in the first string map to two abc in the second string. Now I need to find the longest valid (incrementing) sequence of pairs, in this case there are three equally long solutions: 0:0,3:6; 0:0,6:6; 3:0,6:6 (those numbers are starting positions of each pair in the original strings, the length of substrings is 3 as length of "abc"). I would call it the Longest Substring Pair Sequence or LSPQ. (Q is not to confuse String and Sequence)

Here is the LCSP for this example:

LCSP('abcabcabc', 'abcxxxabc') = 
[ [ 6, 6, 3 ],
  [ 6, 0, 3 ],
  [ 3, 6, 3 ],
  [ 0, 6, 3 ],
  [ 3, 0, 3 ],
  [ 0, 0, 3 ] ]


LSPQ(LCSP('abcabcabc', 'abcxxxabc'), 0, 0, 0) = 
   [ { a: 0, b: 0, size: 3 }, { a: 3, b: 6, size: 3 } ]

Now I find it with brute force recursively trying all combinations. So I am limited to about 25 pairs, otherwise it is unpractical. Size=[10,15,20,25,26,30], Time ms = [0,15,300,1000,2000,19000]

Is there a way to do that in linear time or at least not quadratic complexity so that longer input LCSP (List of Common Substring Pairs) could be used.

This problem is similar to the "Longest Common Subsequence", but not exactly it, because the input is not two strings but a list of common substrings sorted by their length. So I do not know where to look for an existing solutions or even if they exist.

Here is my particular code (JavaScript):

function getChainSize(T) {
    var R = 0
    for (var i = 0; i < T.length; i++) R += T[i].size
    return R
}
function LSPQ(T, X, Y, id) {
  // X,Y are first unused character is str1,str2
  //id is current pair
    function findNextPossible() {
        var x = id
        while (x < T.length) {
            if (T[x][0] >= X && T[x][1] >= Y) return x
            x++
        }
        return -1
    }
    var id = findNextPossible()
    if (id < 0) return []
    var C = [{a:T[id][0], b:T[id][1], size:T[id][2] }]
    // with current
    var o = T[id]
    var A = C.concat(LSPQ(T, o[0]+o[2], o[1]+o[2], id+1))
    // without current
    var B = LSPQ(T, X, Y, id+1)
    if (getChainSize(A) < getChainSize(B)) return B
    return A
}
0

There are 0 answers