LeetCode | 3. Longest Substring Without Repeating Characters

2023年7月20日 127次阅读来源: 听这一刻的晚风

题目链接：https://leetcode.com/problems/longest-substring-without-repeating-characters/

题目难度：Medium
题目描述：

Given a string, find the length of the longest substring without repeating characters.

Example 1:

Input: "abcabcbb"
Output: 3
Explanation: The answer is “abc”, with the length of 3.

Example 2:

Input: "bbbbb"
Output: 1
Explanation: The answer is “b”, with the length of 1.

Example 3:

Input: "pwwkew"
Output: 3
Explanation: The answer is “wke”, with the length of 3.
Note that the answer must be a substring, “pwke” is a subsequence and not a substring.

相关主题：Hash Table, Two Pointers, String, Sliding Window

思路 1

对于字符串中的每个字符，找到以该字符开头的能满足条件的最长子串，并在遍历的过程中存储最长的那个。这里借助了 C++ 中 string 的库函数来实现。

最开始实现的代码如下，发现跑得又慢、消耗内存又多。假设最长的子串长度为 $《LeetCode | 3. Longest Substring Without Repeating Characters》$ ，下同。
时间复杂度： $《LeetCode | 3. Longest Substring Without Repeating Characters》$ = $《LeetCode | 3. Longest Substring Without Repeating Characters》$
空间复杂度： $《LeetCode | 3. Longest Substring Without Repeating Characters》$

// C++
class Solution {
public:
    int lengthOfLongestSubstring(string s) {
        int max_len = 0;
        for (int i = 0; i < s.size(); i++) {
            int j;
            for (j = i+1; j < s.size(); j++) {
                string sub_s = s.substr(i, j-i);
                if (sub_s.find(s[j]) != string::npos) {
                    break;
                }
            }
            int len = j - i ;
            if (len > max_len) {
                max_len = len;
            }
        }
        return max_len;
    }
};

上面的代码实现非常慢，而且内存开销很大。经过分析，发现很可能是 string sub_s = s.substr(i, j-i); 这个语句的问题，它会导致每次都要对 s 进行大量的查找、索引操作。为了进行优化，我们可以把子串单独存储下来，不用每次都通过 s 进行索引来返回子串，这样可以将时间复杂度降低到平方阶。
时间复杂度： $《LeetCode | 3. Longest Substring Without Repeating Characters》$
空间复杂度： $《LeetCode | 3. Longest Substring Without Repeating Characters》$

// C++
class Solution {
public:
    int lengthOfLongestSubstring(string s) {
        string max_substr = s.substr(0, 1), temp_substr = "";
        for (int i = 0; i < s.size(); i++) {
            temp_substr.clear();
            temp_substr.append(1, s[i]);
            for (int j = i+1; j < s.size(); j++) {
                if (temp_substr.find(s[j]) != string::npos) {
                    if (temp_substr.size() > max_substr.size()) {
                        max_substr = temp_substr;
                    }
                    break;
                } else {
                    temp_substr.append(1, s[j]);
                    if (temp_substr.size() > max_substr.size()) {
                        max_substr = temp_substr;
                    }
                }
            }
        }
        return max_substr.size();
    }
};

思路 2

在思路 1 中，我们在考察子串时，每个子串的起始长度都是 1 个字符。在遍历的过程中，假设我们已经得到了一个临时的最长子串，设它的长度为 $《LeetCode | 3. Longest Substring Without Repeating Characters》$ 。因为程序只让返回最长子串的长度，所以就没有必要考察长度小于 $《LeetCode | 3. Longest Substring Without Repeating Characters》$ 的子串了。

因此，我们可以把整个字符串想象成一个长的栅格，每个栅格中都放着一个字符。我们用一个初始长度为 1 的窗格来滑动着框选字符，窗格中不能包含重复的字符。一旦窗格中包含重复的字符，就立即跳过，继续向右滑动。窗格的大小只能扩张、不能缩小。当窗格滑动到末尾的时候，窗格的长度就是最长子串的长度。
时间复杂度： $《LeetCode | 3. Longest Substring Without Repeating Characters》$
空间复杂度： $《LeetCode | 3. Longest Substring Without Repeating Characters》$

// C++
bool contain_repeat_chars(string s)
{
    int m[256] = {0};
    for (int i = 0; i < s.size(); i++) {
        if (m[s[i]] > 0) {
            return true;
        } else {
            m[s[i]] = 1;
        }
    }
    return false;
}

class Solution {
public:
    int lengthOfLongestSubstring(string s) {
        string substr = s.substr(0, 1);
        for (int i = 1; i < s.size(); ) {
            size_t found_pos = substr.find(s[i]);
            if (found_pos != string::npos) {
                if (found_pos == 0) {
                    substr.erase(0, 1);
                    substr.append(1, s[i]);
                    i++;
                    continue;
                } else {
                    // move the window to the next char of the found_pos
                    int substr_len = substr.size();
                    i += found_pos + 1;
                    if (i >= s.size()) {
                        return substr.size();
                    } else {
                        substr = s.substr(i - substr_len, substr_len);
                    }
                    // if substr contains repeat chars, keep moving forward
                    while (contain_repeat_chars(substr)) {
                        substr.erase(0, 1);
                        substr.append(1, s[i]);
                        i++;
                        if (i >= s.size()) {
                            return substr.size();
                        }
                    }
                }
            } else {
                substr.append(1, s[i]);
                i++;
            }
        }
        return substr.size();
    }
};

在上面的实现中，窗格是通过 string 来实现的。在滑动的过程中需要对窗格内是否包含重复字符进行判断，因此实现了 bool contain_repeat_chars(string s) 函数。如果不想对子串是否含有重复字符进行判断，也可以把窗格的最大大小记录下来，并允许窗格缩小：
时间复杂度： $《LeetCode | 3. Longest Substring Without Repeating Characters》$
空间复杂度： $《LeetCode | 3. Longest Substring Without Repeating Characters》$

// C++
class Solution {
   public:
    int lengthOfLongestSubstring(string s) {
        if (s.size() == 0) {
            return 0;
        }
        string substr = s.substr(0, 1);
        int max_len = 1;
        for (int i = 1; i < s.size();) {
            size_t found_pos = substr.find(s[i]);
            if (found_pos != string::npos) {
                if (found_pos == 0) {
                    substr.erase(0, 1);
                    substr.append(1, s[i]);
                    i++;
                    continue;
                } else {
                    int new_start = i - substr.size() + found_pos + 1;
                    i = new_start + 1;
                    if (i < s.size()) {
                        substr = s.substr(new_start, 1);
                    }
                }
            } else {
                substr.append(1, s[i]);
                i++;
            }
            if (substr.size() > max_len) {
                max_len = substr.size();
            }
        }
        return max_len;
    }
};

思路 3

在思路 2 中，是用 string 作为子串的数据结构。看到 LeetCode 上其他人有类似的滑动窗口的思路，但是利用哈希表来存储子串，能够进一步提升效率。
时间复杂度： $《LeetCode | 3. Longest Substring Without Repeating Characters》$
空间复杂度： $《LeetCode | 3. Longest Substring Without Repeating Characters》$ （ $《LeetCode | 3. Longest Substring Without Repeating Characters》$ 是字母表的大小）

// C++ 实现，参考了 LeetCode 的 Solution
// https://leetcode.com/problems/longest-substring-without-repeating-characters/solution/
class Solution {
public:
    int lengthOfLongestSubstring(string s) {
        int hash[128] = {};
        // initialize all hash elements to -1
        memset(hash, -1, sizeof(hash));
        int max_len = 0;
        // we use i and j to indicate the window range
        for (int i = 0, j = 0, len = 0; i < s.size() && j < s.size(); j++) {
            int found_index = hash[s[j]];
            if (found_index != -1 && found_index >= i) {
                // s[j] is already in substr, move forward the window
                i = found_index + 1;
            }
            hash[s[j]] = j;
            len = j - i + 1;
            if (len > max_len) {
                max_len = len;
            }
        }
        return max_len;
    }
};

在上面的代码实现中，我们将子字符串放到哈希表 hash[128] 中，它的 key 是字母，value 是字母在字符串 s 中对应的索引。同时我们用两个变量 i 和 j 来标定当前窗格（子字符串）在 s 中对应的索引范围。虽然哈希表 hash[128] 中存储了所有遇到的字符，但是只有对应的 value 在 [i, j) 范围内的才处于当前的窗格中。

2019年03月30日

    原文作者：听这一刻的晚风
    原文地址: https://www.jianshu.com/p/1235305ff327
    本文转自网络文章，转载此文章仅为分享知识，如有侵权，请联系博主进行删除。