Data Structures and Algorithms B
  • Introduction
  • Stable Marrage
  • Huffman Codes
    • ABL
    • Implementation
  • Graph Algorithms
    • Connect Component
    • Bipartiteness
    • Strongly Connected Components
      • Implementation
    • Topological Sort
      • Implementation 1
      • Implementation 2
    • Dijkstra’s Shortest Path
    • Minimum Spanning Tree
      • Prims
        • Implementation 1
      • Kruskels
        • Implementation 1
      • Traveling Salesman Problem
    • k-Clustering
    • Dynamic Programming with Trees
      • Implementation 1
      • Implementation 2
    • Disjoint Sets
      • Implementation
    • Eularian Cycle
      • Implementation
    • Hamiltonian Path
      • Implementation
  • Divide and Conquer
    • Merge Sort
    • Integer Multiplication
    • Closest Pair
    • Master Method
  • Dynamic Programming
    • Interval Scheduling
    • Knapsack Problem
  • Network Flow
    • Maximum Network Flow
    • Improvements
    • Image Segmentation
  • String Algorithms
    • Z Algorithm
    • Boyer-Moore
    • Knuth-Morris-Pratt
    • Suffix Trees
      • Naive Implementation
      • Ukkonens Algorithm
        • Implementation
      • Applications
        • Longest Common Substring
        • Longest Palindromic Substring
        • Longest Repeated Substring
  • Randomized Algorithms
    • Bloom Filters
      • Implementation
Powered by GitBook
On this page

Was this helpful?

  1. String Algorithms
  2. Suffix Trees
  3. Applications

Longest Common Substring

PreviousApplicationsNextLongest Palindromic Substring

Last updated 5 years ago

Was this helpful?

Problem: Given two strings ‘X’ and ‘Y’, return the longest common substring.

Input : X = "abcdxyz", y = "xyzabcd"
Output : "abcd"

Input : X = "zxabcdezy", y = "yzabcdezx"
Output : "abcdez"

Solution

Assume strings s1 = "abab" and s2 = "baba"

S1 + S2 = a b a b $ b a b a #
Index   = 1 2 3 4 5 6 7 8 9 0

First build a generalized suffix tree that includes both strings. For this, you should add a unique delimiter to each string before making the generalized tree.

Then search for the deepest internal node in the tree that contains both delimiters within its subtree. The path label along this tree will be LCS. In our case, our tree returned two valid solutions: 'aba' or 'bab'.

Why this works

A internal node in a suffix tree carries that property that it shares a common suffix from its parent. Therefore the deeper we move down the tree, the longer the path label gets are is shared by at least on other suffix. Searching for both delimiters in the tree is a way of confirm that both strings have visited the internal node, and are therefore both guaranteed to share a common substring.