Z Algorithm
Input: some string s
Output: An array Z, where every Z[i] = length of the longest substring beginning at position i that matches with a prefix of the string.
Example:
T= A B A A B
Z= 0 0 1 2 0Naive
// without dp
vector<int> get_z_array(const string &text)
{
if (text.empty()) return vector<int>();
vector<int> z_array;
z_array.reserve(text.length());
int cur_box_left, cur_box_right;
int start = 0, z_count = 0;
char start_char, end_char;
z_array.push_back(z_count);
for (int i = 1; i < text.length(); i++) {
int i_temp = i;
start_char = text[start];
end_char = text[i_temp];
while (start_char == end_char
&& i_temp < text.length()) {
z_count++;
start++;
i_temp++;
start_char = text[start];
end_char = text[i_temp];
}
z_array.push_back(z_count);
z_count = start = 0;
}
return z_array;
}In the naive approach, we iterate through the string, and at every position of the original string i, we count the number of times that the current character, matches with the start of the string (++1 as we go).
Improved
With dynamic programming, the key idea to reduce the algorithm to linear time is to reference previously calculated z-values. The algorithm begins the same way as before. There is a single iteration through the main string, and at every i the z score becomes calculated. If the z score is 1, we can immediately move on to the next character. Otherwise, if the count is greater than 1, we initiate dp then on the next run by initiating a Boolean flag.
With dp, we reference the previous z scores to determine the one that follows next (rather than recalculating it). Even though there is a second loop, the progress is still linear because i is set back to where j made progress (j - 1). The caveat we have to check for is if z score we are referencing plus the current index exceeds the right bound of the z box, then we'd have to perform additional calculations to find the final z score. This means that the z score we are referencing is guaranteed to be the lower bound for this z score we want, but the moment it exceeds the right bound of the z box, then it could potentially mean that the first character after the bound can additionally match, so we have to check for that. We begin, however, at the distance from the string that is the minimum z score (lower bound). So again, the progress is still linear. Otherwise, if there is no complication, we can simply use the corresponding the z score.
Alternative version that tells us what is going on
Last updated
Was this helpful?