My problem is similar to finding the longest repeating non-overlapping sub-strings but in this manner. Example, my input array is 0,0,0,1,2,1,1,2,1,1,2,1,2,0,0
The longest repeating and non-overlapping sub-string in this input is 121
with the count as 3
Now, remove the occurrences of 121
from the input array and replace them with -1
.
My input now becomes: 0,0,0,-1,-1,-1,2,0,0
Now again find the same sub-strings.
Note: -1 shall not be included in the answer. So, the next longest non-overlapping and repeating sub-string is 00
with count as 2
.
Remove occurrences of 0,0 from the input which then becomes: 0,-1,-1,-1,2,-1
The only occurrences now are 0 and 2 which are of length 1. So, here we stop.
I am only able to think of brute force which will be O(n3). Tried thinking of suffix trees too but don't know how to include the non-overlapping condition in it.
My very naive implementation:
Because I want my substring to have count as atleast 2, it can have length maximum as N/2 (N is the size of input array)
for len = N/2 to 1:
// find all sub-strings of length as "len" in the input sequence
for i=0 to N-len+1
// the sub-string is s[i... (i+len)]
// find non-overlapping count of this sub-string in the input sequence using KMP
int count = find_Count();
// take the substring with maximum count, let it be "r"
end
modify the array to place replace non-overlapping occurrences of "r" with -1
end
As you can see above, the complexity is O(N3).
Any optimal solution, hints or explanation can be of some help.
Thanks