Given a string sequence which contains only four letters, ['a','g','c','t']
for example: agggcttttaaaatttaatttgggccc.
Find all the shortest unique sub-string of the string sequence which are of equal length (the length should be minimum of all the unique sub-strings) ?
For example : aaggcgccttt
answer: ['aa', 'ag', 'gg','cg', 'cc','ct']
explanation:shortest unique sub-string of length 2
I have tried using suffix-arrays coupled with longest common prefix but i am unable to draw the solution perfectly.


I'm not sure what you mean by "minimum unique sub-string", but looking at your example I assume you mean "shortest runs of a single letter". If this is the case, you just need to iterate through the string once (character by character) and count all the shortest runs you find. You should keep track of the length of the minimum run found so far (infinity at start) and the length of the current run.If you need to find the exact runs, you can add all the minimum runs you find to e.g. a list as you iterate through the string (and modify that list accordingly if a shorter run is found).
EDIT: I thought more about the problem and came up with the following solution.
We find all the unique sub-strings of length i (in ascending order). So, first we consider all sub-strings of length 1, then all sub-strings of length 2, and so on. If we find any, we stop, since the sub-string length can only increase from this point.
You will have to use a list to keep track of the sub-strings you've seen so far, and a list to store the actual sub-strings. You will also have to maintain them accordingly as you find new sub-strings.
Here's the Java code I came up with, in case you need it:
The
uniqueStringslist contains all the unique sub-strings of minimum length (used for output). ThealreadySeenlist keeps track of all the sub-strings that have already been seen (used to exclude repeating sub-strings).