Given a sequence of numbers, for example: 1, 2, 1, 2.
Is there any well-known algorithm to detect repetitions and group them together such that the resulting sequence will have the shortest possible size ?

For example, for the previous sequence the result would be (1, 2)x2.

More examples:

Input: 1, 1, 1, 2, 1, 1, 1, 2
Output: ((1)x3, 2)x2

Input: 1, 2, 1, 2, 1, 2
Output: (1, 2)x3

Input: 1, 1, 1, 2, 1, 2
Output: (1)x2, (1, 2)x2

The length of the result (e.g. (1, 2)x2 ) does not include any side information regarding the grouping and repetition (i.e. ignore (),x and the number after x).

For example, the length of (1, 2)x2 is actually 2. The length of ((1)x3, 2)x2 is still 2, since we consider only the number of elements that belongs to the original sequence (in this case 1 and 2).

1 Answers

aropan On

You can use the dynamic programming method. Let's define n as length input sequence and DP[i][j] as the minimum possible length into which the substring will be compressed begin with the index i and ending in the index j. Then there are two cases:

  • consistently glue: DP[i][j] = min(DP[i][k] + DP[k + 1][j]) for all k from i to j - 1;

  • repetitions: DP[i][j] = min(DP[i][k]) for all such k which are divide a substring i..j into identical substrings length k - i + 1. I think the minimum will be in the lowest possible value of k.

Of the two options, choose the minimum. The string itself can also be restored (it can be stored additionally and also recalculated). Initial data DP[i][i] = 1 for all i from 1 to n. The answer is in DP[1][n] (if use 1-index arrays).