438 Find All Anagrams in a String

Given a string s and a non-empty string p, find all the start indices of p's anagrams ins.

Strings consists of lowercase English letters only and the length of both stringssandpwill not be larger than 20,100.

The order of output does not matter.

Example 1:

Input:

s: "cbaebabacd" p: "abc"


Output:

[0, 6]


Explanation:

The substring with start index = 0 is "cba", which is an anagram of "abc".
The substring with start index = 6 is "bac", which is an anagram of "abc".

Example 2:

Input:

s: "abab" p: "ab"


Output:

[0, 1, 2]


Explanation:

The substring with start index = 0 is "ab", which is an anagram of "ab".
The substring with start index = 1 is "ba", which is an anagram of "ab".
The substring with start index = 2 is "ab", which is an anagram of "ab".

The Idea

This is probably the most creative solution I've come up with for a leetcode problem. I also think it has the potential to beat top submissions given a few optimizations, and perhaps no account on the preprocessing step of performing a uniform distribution.
My first thought was simply iterate through the string, and sum together the integers. e.g.

in = apple
sub = ppl

int key = p + p + l;

a + p + p = key ? NO
p + p + l = key ? YES
p + l + e = ket ? NO

This works because of the associative property of addition.
But the main problem with this is that multiple permutation sums can produce the same key identifier. Consider:

in = 'af'
sub = 'be'

key = b + e

a + f = key? YES

Why this happens:
*    * 
abcdef
 *  *

a is smaller than b by 1, but f is greater than e by 1, so their sums are the same.

So to avoid this, why won't develop our own random mapping for the alphabet? The greater our bounds for a random number, the less likely it is that the sum of a word will equal the sum of another word, unless every single character in the word hashes the exact location. We can produce use a uniform distribution to that every number without the bounds of our range has an equal probability of becoming selected. Finally, we perform the same algorithm as above.

unordered_map<char, int> generate_char_map()
{
    unordered_map<char, int> alpha_map;
    alpha_map.reserve(26);

    random_device rd;
    mt19937 eng(rd());
    uniform_int_distribution<> u_dis(0, INT_MAX);

    string alphabet = "abcdefghijklmnopqrstuvwxyz";
    for (char c : alphabet) 
        alpha_map.insert({ c, u_dis(eng) });

    return alpha_map;
}

vector<int> findAnagrams(string &in,  string &sub) 
{
    vector<int> anagram_i;
    if (sub.length() > in.length() || in.empty() || sub.empty())
        return anagram_i;

    unordered_map<char, int> alpha_map = generate_char_map();

    long long sub_key = 0;
    for (int i = 0; i < sub.length(); i++)
        sub_key += alpha_map[sub[i]];


    long long temp_key = 0;

    // base case: first sub.length() chars
    for (int i = 0; i < sub.length(); i++) 
        temp_key += alpha_map[in[i]];

    int back_ptr = 0;
    if (temp_key == sub_key)
        anagram_i.push_back(back_ptr);

    // every other case
    for (int i = sub.length(); i < in.length(); i++) {
        // remove old
        temp_key -= alpha_map[in[back_ptr++]];

        // add new
        temp_key += alpha_map[in[i]];
        if (temp_key == sub_key)
            anagram_i.push_back(back_ptr);
    }

    return anagram_i;
}

int main()
{
    string one3 = "af", two3 = "be";
    vector<int> test3 = findAnagrams(one3, two3);
    print(test3);

    string one = "cbaebabacd", two = "abc";
    vector<int> test = findAnagrams(one, two);
    print(test);

    string one1 = "abab", two1 = "ab";
    vector<int> test1 = findAnagrams(one1, two1);
    print(test1);

    string one2 = "", two2 = "a";
    vector<int> test2 = findAnagrams(one2, two2);
    print(test2);
}

$O(n^2)$ solution with constant space. (too slow)

vector<int> findAnagrams( string &in,  string &sub) 
{
    sort(sub.begin(), sub.end());

    string subtemp;
    vector<int> anagram_i;
    const int go_to = in.length() - sub.length();

    for (int i = 0; i <= go_to; i++) {
        subtemp = in.substr(i, sub.length());
        sort(subtemp.begin(), subtemp.end());
        if (subtemp == sub) 
            anagram_i.push_back(i);
    }

    return anagram_i;
}

int main()
{
    string one2 = "", two2 = "a";
    vector<int> test2 = findAnagrams(one2, two2);
    print(test2);

    string one = "cbaebabacd", two = "abc";
    vector<int> test = findAnagrams(one, two);
    print(test);

    string one1 = "abab", two1 = "ab";
    vector<int> test1 = findAnagrams(one1, two1);
    print(test1);
}

Previous435 Non-overlapping Intervals Next441 Arranging Coins

Last updated 5 years ago