Given a string s and a non-empty string p, find all the start indices of p's anagrams ins.
Strings consists of lowercase English letters only and the length of both stringssandpwill not be larger than 20,100.
The order of output does not matter.
Example 1:
Input:
s: "cbaebabacd" p: "abc"
Output:
[0, 6]
Explanation:
The substring with start index = 0 is "cba", which is an anagram of "abc".
The substring with start index = 6 is "bac", which is an anagram of "abc".
Example 2:
Input:
s: "abab" p: "ab"
Output:
[0, 1, 2]
Explanation:
The substring with start index = 0 is "ab", which is an anagram of "ab".
The substring with start index = 1 is "ba", which is an anagram of "ab".
The substring with start index = 2 is "ab", which is an anagram of "ab".
The Idea
This is probably the most creative solution I've come up with for a leetcode problem. I also think it has the potential to beat top submissions given a few optimizations, and perhaps no account on the preprocessing step of performing a uniform distribution.
My first thought was simply iterate through the string, and sum together the integers. e.g.
in = apple
sub = ppl
int key = p + p + l;
a + p + p = key ? NO
p + p + l = key ? YES
p + l + e = ket ? NO
This works because of the associative property of addition.
But the main problem with this is that multiple permutation sums can produce the same key identifier. Consider:
in = 'af'
sub = 'be'
key = b + e
a + f = key? YES
Why this happens:
* *
abcdef
* *
a is smaller than b by 1, but f is greater than e by 1, so their sums are the same.
So to avoid this, why won't develop our own random mapping for the alphabet? The greater our bounds for a random number, the less likely it is that the sum of a word will equal the sum of another word, unless every single character in the word hashes the exact location. We can produce use a uniform distribution to that every number without the bounds of our range has an equal probability of becoming selected. Finally, we perform the same algorithm as above.
unordered_map<char,int> generate_char_map(){ unordered_map<char,int> alpha_map;alpha_map.reserve(26); random_device rd; mt19937 eng(rd()); uniform_int_distribution<>u_dis(0, INT_MAX); string alphabet ="abcdefghijklmnopqrstuvwxyz";for (char c : alphabet) alpha_map.insert({ c,u_dis(eng) });return alpha_map;}vector<int> findAnagrams(string&in,string&sub) { vector<int> anagram_i;if (sub.length() >in.length() ||in.empty() ||sub.empty())return anagram_i; unordered_map<char,int> alpha_map =generate_char_map();longlong sub_key =0;for (int i =0; i <sub.length(); i++) sub_key +=alpha_map[sub[i]];longlong temp_key =0; // base case: first sub.length() charsfor (int i =0; i <sub.length(); i++) temp_key +=alpha_map[in[i]];int back_ptr =0;if (temp_key == sub_key)anagram_i.push_back(back_ptr); // every other casefor (int i =sub.length(); i <in.length(); i++) { // remove old temp_key -=alpha_map[in[back_ptr++]]; // add new temp_key +=alpha_map[in[i]];if (temp_key == sub_key)anagram_i.push_back(back_ptr); }return anagram_i;}intmain(){ string one3 ="af", two3 ="be"; vector<int> test3 =findAnagrams(one3, two3);print(test3); string one ="cbaebabacd", two ="abc"; vector<int> test =findAnagrams(one, two);print(test); string one1 ="abab", two1 ="ab"; vector<int> test1 =findAnagrams(one1, two1);print(test1); string one2 ="", two2 ="a"; vector<int> test2 =findAnagrams(one2, two2);print(test2);}