Length of the Longest Common Substring without repeating characters

1.2k views Asked by At

Given "abcabcbb", the answer is "abc", which the length is 3.

Given "bbbbb", the answer is "b", with the length of 1.

Given "pwwkew", the answer is "wke", with the length of 3. Note that the answer must be a substring, "pwke" is a subsequence and not a substring.

I have came up with a solution that worked, but failed for several test cases. I then found a better solution and I rewrote it to try and understand it. The solution below works flawlessly, but after about 2 hours of battling with this thing, I still can not understand why this particular line of code works.

import java.util.*;
import java.math.*;

public class Solution {

  public int lengthOfLongestSubstring(String str) {

  if(str.length() == 0)
    return 0;

  HashMap<Character,Integer> map = new HashMap<>();
  int startingIndexOfLongestSubstring = 0;
  int max = 0;

  for(int i = 0; i < str.length(); i++){
      char currentChar = str.charAt(i); 
      if(map.containsKey(currentChar))
         startingIndexOfLongestSubstring = Math.max(startingIndexOfLongestSubstring, map.get(currentChar) + 1);

      map.put(currentChar, i);
      max = Math.max(max, i - startingIndexOfLongestSubstring + 1);

      }//End of loop

    return max;

   }
}

The line in question is

max = Math.max(max, i - startingIndexOfLongestSubstring + 1);

I don't understand why this works. We're taking the max between our previous max, and the difference between our current index and the starting index of what is currently the longest substring and then adding 1. I know that the code is getting the difference between our current index, and the startingIndexOfSubstring, but I can't conceptualize WHY it works to give us the intended result; Can someone please explain this step to me, particularly WHY it works?

2

There are 2 answers

1
Kishore Bandi On BEST ANSWER

I'm usually bad at explaining, let me give it a shot by considering an example.

String is "wcabcdeghi".

Forget the code for a minute and assume we're trying to come up with a logic.

  1. We start from w and keep going until we reach c -> a -> b -> c.
    We need to stop at this point because "c" is repeating. So we need a map to store if a character is repeated. (In code : map.put(currentChar, i); )
  2. Now that we know if a character is repeated, We need to know what is the max. length so far. (In code -) max
  3. Now we know there is no point in keeping track of count of first 2 variables w->c. This is because including this, we already got the Max. value. So from next iteration onwards we need to check length only from a -> b -> soon.
    Lets have a variable (In code -)startingIndexOfLongestSubstring to keep track of this. (This should've been named startingIndexOfNonRepetativeCharacter, then again I'm bad with naming as well).
  4. Now we again keep continuing, but wait we still haven't finalized on how to keep track of sub-string that we're currently parsing. (i.e., from abcd...)
    Coming to think of it, all I need is the position of where "a" was present (which is startingIndexOfNonRepetativeCharacter) so to know the length of current sub-string all I need to do is (In code -)i - startingIndexOfLongestSubstring + 1 (current character position - The non-repetative character length + (subtraction doesn't do inclusive of both sides so adding 1). Lets call this currentLength
  5. But wait, what are we going to do with this count. Every time we find a new variable we need to check if this currentLength can break our max. So (In code -) max = Math.max(max, i - startingIndexOfLongestSubstring + 1);
  6. Now we've covered most of the statements that we need and according to our logic everytime we encounter a variable which was already present all we need is startingIndexOfLongestSubstring = map.get(currentChar). So why are we doing a Max?
    Consider a scenario where String is "wcabcdewghi". when we start processing our new counter as a -> b -> c -> d -> e -> w At this point our logic checks if this character was present previously or not. Since its present, it starts the count from index "1". Which totally messes up the whole count. So We need to make sure, the next index we take from map is always greater than the starting point of our count(i.e., select a character from the map only if the character occurs before startingIndexOfLongestSubstring).

Hope I've answered all lines in the code and mainly If the explanation was understandable.

2
Dmitry Gorkovets On

Because

i - startingIndexOfLongestSubstring + 1

is amount of characters between i and startingIndexOfLongestSubstring indexes. For example how many characters between position 2 and 3? 3-2=1 but we have 2 characters: on position 2 and position 3.

I've described every action in the code:

public class Solution {

    public int lengthOfLongestSubstring(String str) {

        if(str.length() == 0)
            return 0;

        HashMap<Character,Integer> map = new HashMap<>();
        int startingIndexOfLongestSubstring = 0;
        int max = 0;

        // loop over all characters in the string
        for(int i = 0; i < str.length(); i++){
            // get character at position i
            char currentChar = str.charAt(i);
            // if we already met this character
            if(map.containsKey(currentChar))
                // then get maximum of previous 'startingIndexOfLongestSubstring' and 
                // map.get(currentChar) + 1 (it is last occurrence of the current character in our word before plus 1)
                // "plus 1" - it is because we should start count from the next character because our current character 
                // is the same
                startingIndexOfLongestSubstring = Math.max(startingIndexOfLongestSubstring, map.get(currentChar) + 1);

            // save position of the current character in the map. If map already has some value for current character 
            // then it will override (we don't want to know previous positions of the character)
            map.put(currentChar, i);
            // get maximum between 'max' (candidate for return value) and such value for current character
            max = Math.max(max, i - startingIndexOfLongestSubstring + 1);

        }//End of loop

        return max;

    }
}