Boyer-Moore
Is another efficient string pattern matching algorithm that is the standard benchmark from practical string search literature. Here is some insight.
Consider the naive approach that compares fix pattern P with every substring of T of length
As we can see w
and o
have been matched, but u
and r
have failed. Instead of moving one more alignment as the naive algorithm does, we can show that skipping 2 alignments can guarantee no fault in the general algorithm. This is because u
does not exist anywhere in P
, so it should be safe just moving past P
after u
.
Boyer-Moore uses this principles as one of its heuristics for good character matching. The algorithm is able to determine from its character comparisons whether or not it can skip other pointless comparisons.
Rules
The search string
T
is read from left to right, but the character comparisons are made from right to left.
The Bad Character Rule: Upon mismatch of a character (1) if the character is not in the string
P
move the alignment pasted the mismatched character, or (2) if the character is otherwise in the string, move the alignment to the first character that aligns with the mismatched character (iterating right to left).
The Good Suffix Rule: let
t
=the substring that is currently matched with betweenT
andP
. (1) Skip to there is a match with betweent
andT
, or (2)P
moves pastt
.
Putting it Togeher
Both the bad character rule and good suffix rule are used to skip alignments in the search text, that the naive algorithm would have otherwise ignored. Using both rules, Boyer Moore optimizes the amount of skips by taking maximum of both rules.
Time Complexity: Space Complexity: , where defines the alphabet of T
.
Implementation Details
In practice we need to be able to immediately know the length to skip in T
. To do this, a preprocessing step must be done to produce a 2 dimensional lookup table given P
and T
, for both rule BC and GS. For example:
Last updated
Was this helpful?