395 Longest Substring with At Least K Repeating Characters

Find the length of the longest substring T of a given string (consists of lowercase letters only) such that every character in T appears no less than k times.

Example 1:

Input:
s = "aaabb", k = 3

Output:
3

The longest substring is "aaa", as 'a' is repeated 3 times.

Example 2:

Input:
s = "ababbc", k = 2

Output:
5

The longest substring is "ababb", as 'a' is repeated 2 times and 'b' is repeated 3 times.

The Idea: The general idea is to find the longest consequtive sequence of characters that have a quantity >= k, and then recursively iterate on those substrings if things don't work out. First, I mapped characters to thier frequencies. Then I found the longest sequences of characters that occur at least k times. I then perform a check on this substring explicitly to confirm that the local characters in the substring appear k times. If not, we can greedily iterate perform that same proceedure on that substring until we do find a valid substring match. This works because we always find the largest possible substring of at least k, and split on those to find the next smallest ones.

Complexity: ??

unordered_map<char, int> get_freq_map(const string &s) {
    unordered_map<char, int> s_freq;
    s_freq.reserve(s.size());
    for (int i = 0; i < s.size(); i++) {
        if (s_freq.find(s[i]) == s_freq.end()) {
            s_freq.insert({ s[i], 1 });
        }
        else s_freq[s[i]]++;
    }
    return s_freq;
}


bool isSubstringAtLeastK(const string &s, int k, unordered_map<char, int> &freq_map)
{
    for (char c : s) {
        if (freq_map[c] < k) return false;
    }
    return true;
}


int longestSubstring(const string &s, int k, unordered_map<char, int> s_freq) {
    // get ordered array where chars are >= k
    vector<bool> is_valid_i(s.size());
    for (int i = 0; i < s.size(); i++) {
        is_valid_i[i] = s_freq[s[i]] >= k;
    }

    // find the longest consequtive 1's
    int start = 0;
    int max_dist = 1;
    int temp_dist = 1;
    int i = 1;
    for (; i < is_valid_i.size(); i++) {
        if (is_valid_i[i - 1] == is_valid_i[i] && 
            is_valid_i[i-1] && is_valid_i[i]) {
            temp_dist++;
        }
        else if (temp_dist > max_dist) {
            max_dist = temp_dist;
            start = i - max_dist;
            temp_dist = 1;
        }
    }
    if (temp_dist > max_dist) {
        max_dist = temp_dist;
        start = i - max_dist;
    }

    // assert that the substring is valid
    string long_cons_substr = s.substr(start, max_dist);
    if (long_cons_substr.length() < k)
        return 0;
    unordered_map<char, int> sub_freq_map = get_freq_map(long_cons_substr);
    if (isSubstringAtLeastK(long_cons_substr, k, sub_freq_map))
        return max_dist;
    else return longestSubstring(long_cons_substr, k, sub_freq_map);
}


int longestSubstring(string s, int k) {
    return longestSubstring(s, k, get_freq_map(s));
}

Last updated