Naive Implementation
This is also known as a keyword tree, but it accomplishes the same task.
Naive
Disadvantages:
Not compressed, traversals are redundant in some parts.
Use of literal characters rather than indices, so its also not space efficient.
time complexity. For construction of single word of size , every suffixes of quantity are inserted character by character by traversing down through the tree.
Advantages:
Less error prone
Simple to implement
Works

#define ALPHA_SIZE 26
#define EOC '$'
struct Node {
Node() {
// assumption: the number unique characters does not
// extend beyond the english alphabet
children.reserve(ALPHA_SIZE);
}
unordered_map<char, Node*> children;
int index;
};
class Suffix_Tree {
public:
Suffix_Tree(const string word) : _word(word) {
_root = new Node();
const size_t size = word.size();
for (int i = 0; i < size - 1; i++) {
_insert(word.substr(i, size - i));
}
}
void bfs_pretty_print()
{
if (!_root) return;
queue<Node*> q;
q.push(_root);
q.push(nullptr);
while (!q.empty()) {
Node* cur_top = q.front();
q.pop();
if (cur_top) {
for (auto i : cur_top->children) {
cout << i.first << ' ';
if (i.first == EOC) {
cout << "(" << _word.size() - cur_top->index << ") ";
}
}
for (auto i : cur_top->children)
q.push(i.second);
}
else {
if (!q.empty())
q.push(nullptr);
cout << endl;
}
}
}
private:
Node *_root;
const string _word;
void _insert(const string name)
{
int depth_count = 0;
Node *iter = _root;
for (char c : name) {
depth_count++;
if (iter->children.find(c) == iter->children.end()) {
Node *newNode = new Node();
iter->children.insert({ c, newNode });
if (c == EOC) {
// making 1-indexed
iter->index = depth_count + 1;
}
}
iter = iter->children.find(c)->second;
}
}
};
int main() {
string test_str = "aabaacab$";
Suffix_Tree tree(test_str);
tree.bfs_pretty_print();
}
Output
a b c
a b c a $ (8) a
b c a $ (7) a a b
a a a b c $ (6)
a b c $ (5) a
c $ (4) a b
a b $ (3)
b $ (2)
$ (1)
Implementation Details
How this algorithm works can be summarized in the _insert
method. Basically, we are recreating a trie with a few minor changes such as the special character delimiter, and insertion of additional suffixes.
A trie is essentially a n-ary tree. Since we know that each node that carries a character is guarenteed to be unique, we can substitute a vector
with an unordered_map
. For each word that gets inserted, its characters guide its traversal down the tree. If no node is found, a new one gets created. The leaf nodes, indicated by the special character delimiter carry the occurrences of a suffix.
Last updated
Was this helpful?