I have a string S
which consists of a
's and b
's. Perform the below operation once. Objective is to obtain the lexicographically smallest string.
Operation: Reverse exactly one substring of S
e.g.
- if
S = abab
thenOutput = aabb
(reverseba
of stringS
) - if
S = abba
thenOutput = aabb
(reversebba
of stringS
)
My approach
Case 1: If all characters of the input string are same then output will be the string itself.
Case 2: if S
is of the form aaaaaaa....bbbbbb....
then answer will be S
itself.
otherwise: Find the first occurence of b
in S
say the position is i. String S
will look like
aa...bbb...aaaa...bbbb....aaaa....bbbb....aaaaa...
|
i
In order to obtain the lexicographically smallest string the substring that will be reversed starts from index i. See below for possible ending j.
aa...bbb...aaaa...bbbb....aaaa....bbbb....aaaaa...
| | | |
i j j j
Reverse substring S[i:j]
for every j and find the smallest string.
The complexity of the algorithm will be O(|S|*|S|)
where |S|
is the length of the string.
Is there a better way to solve this problem? Probably O(|S|)
solution.
What I am thinking if we can pick the correct j
in linear time then we are done. We will pick that j where number of a
's is maximum. If there is one maximum then we solved the problem but what if it's not the case? I have tried a lot. Please help.
TL;DR: Here's an algorithm that only iterates over the string once (with O(|S|)-ish complexity for limited string lengths). The example with which I explain it below is a bit long-winded, but the algorithm is really quite simple:
So you're basically comparing the value of the string if it were reversed up to the current point, with the value of the string if it were only reversed up to a (so-far) optimal point, and updating this optimal point on-the-fly.
Here's a quick code example; it could undoubtedly be coded more elegantly:
(The code could be simplified by comparing
reverse
andvalue
whenever a zero is found, and not just when the end of a maximally long sequence of zeros is encountered.)You can create an algorithm that only iterates over the input once, and can process an incoming stream of unknown length, by keeping track of two values: the value of the whole string interpreted as a reverse (lsb-to-msb) binary number, and the value of the string with one part reversed. Whenever the reverse value goes below the value of the stored best end-point, a better end-point has been found.
Consider this string as an example:
or, written with zeros and ones for simplicity:
We iterate over the leading zeros until we find the first one:
This is the start of the sequence we'll want to reverse. We will start interpreting the stream of zeros and ones as a reversed (lsb-to-msb) binary number and update this number after every step:
Then at every step, we double the unit and update the reverse number:
At this point we find a one, and the sequence of zeros comes to an end. It contains 2 zeros, which is currently the maximum, so we store the current position as a possible end-point, and also store the current reverse value:
Then we go on iterating over the string, but at every step, we update the value of the possible endpoint, but now as a normal (msb-to-lsb) binary number:
At this point we find that we have a sequence of 3 zeros, which is longer that the current maximum of 2, so we throw away the end-point we had so far and replace it with the current position and reverse value:
And then we go on iterating over the string:
Here we find that we have another sequence with 3 zeros, so we compare the current reverse value with the end-point's value, and find that the stored endpoint has a lower value:
so we keep the end-point set to position 10 and we go on iterating over the string:
And we find another sequence of 3 zeros, so we compare this position to the stored end-point:
and we find it has a smaller value, so we replace the possible end-point with the current position:
And then we iterate over the final digit:
So at the end we find that position 21 gives us the lowest value, so it is the optimal solution:
Here's a C++ version that uses a vector of bool instead of integers. It can parse strings longer than 64 characters, but the complexity is probably quadratic.