Longest Repeated Substring

Problem Find the longest common substring shared within the same word.

Examples

LRS in AAAAAAAAAA is : AAAAAAAAA
LRS in ABCDEFG is    : No repeated substring
LRS in ABABABA is    : ABABA
LRS in ATCGATCGA is  : ATCGA

Solution

Assume S = "ABABABA". Begin by building the suffix tree for S.

S = A B A B A B A
I = 1 2 3 4 5 6 7

Next, return the path label to the deepest internal node.

Why this works

As mentioned before, an internal node carries the property that it is shares a common substring with its parent. In addition, it branches out to at least two other occurrences. So the deepest node will not necessarily the longest path label, or the most repeated substring, but the longest substring that is repeated at least 2 times.

A new branch is created per new substring suffix. The deeper into the tree, the more characters are shared with another substring in the tree. Therefore the deepest node will contain the longest shared occurrence.

Last updated

Was this helpful?