Data Structures and Algorithms B
  • Introduction
  • Stable Marrage
  • Huffman Codes
    • ABL
    • Implementation
  • Graph Algorithms
    • Connect Component
    • Bipartiteness
    • Strongly Connected Components
      • Implementation
    • Topological Sort
      • Implementation 1
      • Implementation 2
    • Dijkstra’s Shortest Path
    • Minimum Spanning Tree
      • Prims
        • Implementation 1
      • Kruskels
        • Implementation 1
      • Traveling Salesman Problem
    • k-Clustering
    • Dynamic Programming with Trees
      • Implementation 1
      • Implementation 2
    • Disjoint Sets
      • Implementation
    • Eularian Cycle
      • Implementation
    • Hamiltonian Path
      • Implementation
  • Divide and Conquer
    • Merge Sort
    • Integer Multiplication
    • Closest Pair
    • Master Method
  • Dynamic Programming
    • Interval Scheduling
    • Knapsack Problem
  • Network Flow
    • Maximum Network Flow
    • Improvements
    • Image Segmentation
  • String Algorithms
    • Z Algorithm
    • Boyer-Moore
    • Knuth-Morris-Pratt
    • Suffix Trees
      • Naive Implementation
      • Ukkonens Algorithm
        • Implementation
      • Applications
        • Longest Common Substring
        • Longest Palindromic Substring
        • Longest Repeated Substring
  • Randomized Algorithms
    • Bloom Filters
      • Implementation
Powered by GitBook
On this page

Was this helpful?

  1. String Algorithms
  2. Suffix Trees

Naive Implementation

PreviousSuffix TreesNextUkkonens Algorithm

Last updated 5 years ago

Was this helpful?

This is also known as a keyword tree, but it accomplishes the same task.

Naive

Disadvantages:

  • Not compressed, traversals are redundant in some parts.

  • Use of literal characters rather than indices, so its also not space efficient.

  • O(n2)O(n^2)O(n2) time complexity. For construction of single word of size ∣P∣|P|∣P∣, every nnn suffixes of quantity PPP are inserted character by character by traversing down through the tree.

Advantages:

  • Less error prone

  • Simple to implement

  • Works

#define ALPHA_SIZE 26
#define EOC '$'

struct Node {
    Node() {
        // assumption: the number unique characters does not 
        // extend beyond the english alphabet
        children.reserve(ALPHA_SIZE);
    }

    unordered_map<char, Node*> children;
    int index;
};

class Suffix_Tree {
public:
    Suffix_Tree(const string word) : _word(word) {
        _root = new Node();
        const size_t size = word.size();

        for (int i = 0; i < size - 1; i++) {
            _insert(word.substr(i, size - i));
        }
    }

    void bfs_pretty_print() 
    {
        if (!_root) return;

        queue<Node*> q;
        q.push(_root);
        q.push(nullptr);

        while (!q.empty()) {
            Node* cur_top = q.front();
            q.pop();

            if (cur_top) {
                for (auto i : cur_top->children) {
                    cout << i.first << ' ';
                    if (i.first == EOC) {
                        cout << "(" << _word.size() - cur_top->index << ") ";
                    }
                }

                for (auto i : cur_top->children) 
                    q.push(i.second);
            }
            else {
                if (!q.empty())
                    q.push(nullptr);
                cout << endl;
            }
        }
    }

private:
    Node *_root;
    const string _word;

    void _insert(const string name) 
    {
        int depth_count = 0;
        Node *iter = _root;
        for (char c : name) {
            depth_count++;

            if (iter->children.find(c) == iter->children.end()) {
                Node *newNode = new Node();
                iter->children.insert({ c, newNode });

                if (c == EOC) {
                    // making 1-indexed
                    iter->index = depth_count + 1;
                }
            }

            iter = iter->children.find(c)->second;

        }
    }
};

int main() {
    string test_str = "aabaacab$";
    Suffix_Tree tree(test_str);
    tree.bfs_pretty_print();
}

Output

a b c
a b c a $ (8) a
b c a $ (7) a a b
a a a b c $ (6)
a b c $ (5) a
c $ (4) a b
a b $ (3)
b $ (2)
$ (1)

Implementation Details

How this algorithm works can be summarized in the _insert method. Basically, we are recreating a trie with a few minor changes such as the special character delimiter, and insertion of additional suffixes.

A trie is essentially a n-ary tree. Since we know that each node that carries a character is guarenteed to be unique, we can substitute a vector with an unordered_map. For each word that gets inserted, its characters guide its traversal down the tree. If no node is found, a new one gets created. The leaf nodes, indicated by the special character delimiter carry the occurrences of a suffix.