Shortest common SuperSequence

1.4k views Asked by At

The problem is that given 2 strings X and Y , we need to find length of the shortest sequence Z such that both the strings occur as sub sequence in Z. Now, I get the intuition that length = |X| + |Y| -|LCS(X,Y)|. But how do we prove it ?

Ex: X = AGGTAB , Y = GXTXAYB , then Z = AGXGTXAYB and |Z| = 9 . LCS(X,Y) = GTAB

Reference : Link 1 Link 2

3

There are 3 answers

0
Number945 On BEST ANSWER

Let be string 1 and be string 2.

Let S be any shortest super sequence which consist of both X and Y as sequences. Let's try to create such a sequence.

Clearly , X exists as a sequence in S. Hence, initially S can be shown as :


Note: '...' means that place can be empty or can be used to place characters of Y into this to make it shortest super sequence consisting of both X and Y.

Now, this |S| has to be minimum. We need to inject only minimum characters from Y into this S to make it shortest super sequence consisting of both X and Y. Also, note the condition is that Y needs to appear as a sub sequence and not substring. Hence, if I find any sequence of characters of Y which also occur in X, then those characters of Y need not be injected.

Hence, for |S| to be minimum , we shown that minimum characters of Y need to be injected so that Y occurs as a sequence. For this, we find such longest sequence of X which also occur in Y which in other words = LCS(X,Y).

|S| = |X| + ( |Y| - |LCS(X,Y)| ) = |X| +|Y| - |LCS(X,Y)|

Now, multiple LCS(X,Y) exists. Take any and construct the sequence.

4
PartlyCloudy On

First, looking at the second link you send, it is possible to create a super sequence which is of size |X| + |Y| - |LCS(X,Y)|:

For two input sequences, an scs can be formed from a longest common subsequence (lcs) easily...

So now all that remains is to prove that it is actually the shortest common supersequence. Assuming the contrary, suppose that there is a shortest common supersequence such that its length is |X| + |Y| - |LCS(X,Y)| - 1 == |X| + |Y| - (|LCS(X,Y)| + 1). But in this string, you have X as a subsequence, and Y as a subsequence. That means that they intersect in |LCS(X,Y)| + 1 places judging by the size of the string. That is there is a LCS of size |LCS(X,Y)| + 1, a contradiction to the definition of the LCS!

Hence, the size is exactly |X| + |Y| - |LCS(X,Y)|. q.e.d

0
grovkin On

Since you only care about the counts of the letters, you can sort all the sequences (X, Y, Z and LCS(X,Y) ). This is because sorting sequences (after figuring out minimal contaning one and the LCS) will keep the counts of the letters the same.

If you are considering sorted sequences, you only need to consider sequences formed from 1 letter. This is because the count of each letter in each sequence is independent of the count of all other letters in each sequence.

Now if you consider sequences consisting of only one letter, then it should be obvious that the minimal sequence which contains both X and Y as a subsequence, will be either X or Y while LCS(X,Y) will be the other one. So (adapting notation "MCS" for minimal containing sequence), |MCS(X,Y)| + |LCS(X,Y)| = |X| + |Y|.