LeetCode 3. Longest Substring Without Repeating Characters

Hash Set + 双指针,其实又花了不少的时间重新理解了一下优化方法……却是很久没刷题了。

Question

Given a string, find the length of the longest substring without repeating characters.

Example 1:

Input: “abcabcbb”
Output: 3
Explanation: The answer is “abc”, with the length of 3.

Example 2:

Input: “bbbbb”
Output: 1
Explanation: The answer is “b”, with the length of 1.

Example 3:

Input: “pwwkew”
Output: 3
Explanation: The answer is “wke”, with the length of 3.
Note that the answer must be a substring, “pwke” is a subsequence and not a substring.


Solution

En

In this case, using a hash set to store the substring of the string is a better choice since the look up time for hash set is .

The approach here is to use sliding window to solve the problem. For a string/array, which starting indices is , and end indices is . Initially, . Then keep adding the element on into hash set, and moving the to the right until the element exists in the hash set. At this point, the length becomes to local maximum, therefore needs to be moved to right, and the element needs to be dropped.

Optimized Method

Instead of using hash set, hash map or primitive type table(if charset is known and relative small) is used in the optimized method.

Save the index of next index of current index in the map. During each iteration, check whether the current value on index has already been saved in container. If so, which means all the elements to the index of that value contains repeated value, and therefore do not need to be considered anymore.

Consider the string abcade, the longest substring without repeated value is bcade. For the first a, the map stores next index of a, which is 1. When the iteration hits second a, it draws out the value of index stored on a, and starts from there. If there is any repeated value, this algorithm does the same thing by skipping several values and therefore save time.


CN

在这题里,用hash set来存储数据结构是一个比较好的选择,毕竟hash set的查询时间是

具体解决这个问题的技巧叫做sliding window。具体的实现方法为,设字符串/数组的左边指针为,右边指针为,在最初阶段,。在遍历字符串/数组的过程中,如果当前值不存在于hash set中,将当前值存入hash set中,并继续将继续向右移动,直至当前数值存在于hash set中。此时删除hash set中存在的数值,并将也向右移动。

优化方案

优化方案中并没有使用hash set去存储数据,而是用了hash map或者是表(在知道编码格式以及编码格式相对而言比较小的前提下)。

在这个方案之中,将数组的值和当前值得下一个值得索引以键值对的模式存储在hash map之中。在遍历的过程中,检查当前数值是否已经存在于hash map之中。如果不存在于容器中,则将当前数值和索引加入hash map中,反之,则说明当前值已经存在于子字符串之中,可以直接获取所对应的索引,将移到那个位置上。

例如说abcadeaf,第一个a存进hash map中的数值为1,也就是第一个b所在的位置。所以当检测到第二个a存在的时候,可以直接移动到1,并且从bca开始寻找不重复的数值。但是当数组遍历到第三个a时,该方法并非开始另一个遍历过bc, a,而是直接跳过bca,从dea开始。

这是因为重复的字符既然是a,那么只要a还在子字符串中,那么子字符串就永远也不会是没有重复字符的子字符串。


Code

Java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public class Solution {
public int lengthOfLongestSubstring(String s) {
int n = s.length();
Set<Character> set = new HashSet<>();
int ans = 0, i = 0, j = 0;
while (i < n && j < n) {
// try to extend the range [i, j]
if (!set.contains(s.charAt(j))) {
set.add(s.charAt(j++));
ans = Math.max(ans, j-i);
} elst {
set.remove(s.charAt(i++));
}
}
return ans;
}
}

Optimized

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
public class Solution {
// using hash map
public int lengthOfLongestSubstring(String s) {
int n = s.length(), ans = 0;
// current index of character
Map<Character, Integer> map = new HashMap<>();
// try to extend the range [i, j]
for (int j = 0; i = 0; j < n; j++) {
if (map.containsKey(s.charAt(j))) {
i = Math.max(map.get(s.charAt(j)), i);
}
ans = Math.max(ans, j - i + 1);
map.put(s.charAt(j), j + 1);
}
return ans;
}

// Assuming ASCII 128
// int[26] for Letters 'a' - 'z' or 'A' - 'Z'
// int[128] for ASCII
// int[256] for Extended ASCII
public int lengthOfLongestSubstring2(String s) {
int n = s.length(), ans = 0;
// current index of character
int[] index = new int[128];
// try to extend the range [i, j]
for (int j = 0, i = 0; j < n; j++) {
i = Math.max(index[s.charAt(j)], i);
ans = Math.max(ans, j - i + 1);
index[s.charAt(j)] = j + 1;
}
return ans;
}
}

Python3

1
2
3
4
5
6
7
8
9
10
11
12
# already optimized solution
class Solution:
def lengthOfLongestSubstring(self, s: str) -> int:
used = {}
max_length = start = 0
for i, c in enumerate(s):
if c in used and start <= used[c]:
start = used[c] + 1
else:
max_length = max(max_length, i - start + 1)
used[c] = i
return max_length

Complexity Analysis

  • Time Complexity: , in the worst case, all the elements need to be iterated twice by and
  • Space Complexity: , where is the size of charset/slphabet, and is the size of the string.

Reference

Solution

A Python solution - 85ms - O(n)>)

-------The end of this article  Thank you for your reading-------
0%