Longest Increasing Subsequence code in O(N)?

403 views Asked by At

Someone asked me a question

Find the longest alphabetically increasing or equal string 
composed of those letters. Note that you are allowed to drop 
unused characters.

So ghaaawxyzijbbbklccc returns aaabbbccc.

Is an O(n) solution possible?

and I implemented it code [in python]

s = 'ghaaawxyzijbbbklccc'
lst = [[] for i in range(26)]

for ch in s:
    ml = 0
    for i in range(0,ord(ch) + 1 - ord('a')):
        if len(lst[i]) > len(lst[ml]):
            ml= i
    cpy = ''.join(lst[ml])
    lst[ord(ch) - ord('a')] = cpy + ch

ml = 0
for i in range(26):
    if len(lst[i]) > len(lst[ml]):
        ml = i
print lst[ml]

and the answer is 'aaabbbccc'

I have tried this some more examples and all works! and as far as I can think the complexity of this code is O(N) let's take an example suppose I have a string 'zzzz' so the main loop will run 4 times and internal loop will run 26 times for each iteration so we can say in worst case the code will run in

O(26*N + 26)
---------^-
this is the last iteration

so O(N) is acceptable?

Now questions are

  1. Is it works in O(N) my code at ideone
  2. If it works in O(N) then why to use DP of O(N2) code of DP
  3. Is it better then this code Friends code
  4. Limitations of this code
2

There are 2 answers

1
J Richard Snape On BEST ANSWER
  1. It's O(N)
  2. 'why to use DP of O(N2)' : You don't need to for this problem. Note, though, that you take advantage of the fact that your sequence tokens (letters) are finite - so you can set up a list to hold all the possible starting values (26) and you need only look for the longest member of that list - an O(1) operation. A more generalised solution for sequences with an arbitrary number of ordered tokens can be done in O(NlogN).
  3. Your friend's code is basically the same, just mapping the letters to numbers and their list for the 26 starting places holds 26 numbers for letter counts - they don't need to do either of those. Conceptually, though, it's the same thing - holding a list of lists.

    "Better" is a matter of opinion. Although it has the same asymptotic complexity, the constant terms may be different, so one may execute faster than the other. Also, in terms of storage, one may use very slightly more memory than the other. With such low n - judging which is more readable may be more important than the absolute performance of either algorithm. I'm not going to make a judgement.

    You might notice a slight difference where the "winning" sequence is a tie. For instance - on the test string edxeducation that you have there - your implementation returns ddin whereas your friend's returns ddio. Both seem valid to me - without a rule to break such ties.

  4. The major limitation of this code is that it can only cope with sequences composed entirely of letters in one particular case. You could extend it to cope with upper and lower case letters, either treating them the same, or using an ordering where all lower case letters were "less than" all upper case letters or something similar. This is just extending the finite set of tokens that it can cope with.

    To generalise this limitation - the code will only cope with finite sets of sequence tokens as noted in 2. above. Also - there is no error handling, so if you put in a string with, say, digits or punctuation, it will fail.

4
Simon On

This is a variation of the Longest Increasing Subsequence. The difference is that your elements are bounded, since they can only run from 'a' to 'z'. Your algorithm is indeed O(N). O(N log N) is possible, for example using the algorithm from the link above. The bound on the number of possible elements turns this into O(N).