Given a sequence of numbers, for example: 1, 2, 1, 2
.
Is there any well-known algorithm to detect repetitions and group them together such that the resulting sequence will have the shortest possible size ?
For example, for the previous sequence the result would be (1, 2)x2
.
More examples:
Input: 1, 1, 1, 2, 1, 1, 1, 2
Output: ((1)x3, 2)x2
Input: 1, 2, 1, 2, 1, 2
Output: (1, 2)x3
Input: 1, 1, 1, 2, 1, 2
Output: (1)x2, (1, 2)x2
EDIT:
The length of the result (e.g. (1, 2)x2
) does not include any side information regarding the grouping and repetition (i.e. ignore (),x
and the number after x
).
For example, the length of (1, 2)x2
is actually 2.
The length of ((1)x3, 2)x2
is still 2, since we consider only the number of elements that belongs to the original sequence (in this case 1 and 2).
You can use the dynamic programming method. Let's define
n
as length input sequence andDP[i][j]
as the minimum possible length into which the substring will be compressed begin with the indexi
and ending in the indexj
. Then there are two cases:consistently glue:
DP[i][j] = min(DP[i][k] + DP[k + 1][j])
for allk
fromi
toj - 1
;repetitions:
DP[i][j] = min(DP[i][k])
for all suchk
which are divide a substringi..j
into identical substrings lengthk - i + 1
. I think the minimum will be in the lowest possible value ofk
.Of the two options, choose the minimum. The string itself can also be restored (it can be stored additionally and also recalculated). Initial data
DP[i][i] = 1
for alli
from 1 ton
. The answer is inDP[1][n]
(if use 1-index arrays).