139 Word Break

Given a non-empty string s and a dictionary wordDict containing a list of non-empty words, determine if s can be segmented into a space-separated sequence of one or more dictionary words. You may assume the dictionary does not contain duplicate words.

For example, given
s = "leetcode",
dict = ["leet", "code"].

Return true because "leetcode" can be segmented as "leet code".

The Idea: The goal is to find a continuous set of substrings in the dictionary that can build S. One way of doing this is through a graph model. The graph itself will represent all the substrings path that exists within the dictionary to build S.

Consider the following example:

        0123456789
S    = "nightymare"
dict = { "night", "mare", "ni", "nigh", "ma", "re", "y", "ty" }

Like many algorithm questions, we can model this problem using a graph. In the context of this graph, our goal is to look for a viable path that exists within the dictionary from the parent index that creates a continuous representation of our string S. In the example above, our possible options beginning at index 0 are (night - 5), (nigh - 4), or (ni - 2). From each of these options, we explore a smaller, but identical subproblem.

Of course that can be more than one possible path that in the tree that leads to an appropriate answer. However, we are only interested IF it is possible.

A BFS procedure would allow us to quickly identify the shortest path. A DFS procedure may otherwise explore down a massive subtree that doesnt contain the ending index.

In the graph, we notice some repeating subtrees. This happens because two words in our dictionary can addup to the same index. For example, in the example above, consider the indicies 4 (nigh) and 5 (night). Following (nigh) we can take dictionary word (ty) and obtain index 6 (nighty). At the same time, (night) can take (y) to obtain the same index 6 (nighty).

In any case, we want to avoid any redundant computations, and only perform the computation down the tree once. To solve this, we maintain a visited array that tells us which indices have been explored. Doing this removes duplicate subtrees:

Complexity: O(N^3) time where N is the length of the string. We have to try and identify every substring in S, and see if it is in the dictionary which takes (n-1)+(n-2)+(n-3)... or N^2 time due to string comparison and hash. This is explored for every index of S, which is only iterated through once due to the visited set.

Might be N^2 if substring check is O(1) per string

bool wordBreak(string s, vector<string>& wordDict) {
    unordered_set<string> set;
    set.reserve(wordDict.size());
    for (string &s : wordDict) 
        set.insert(s);

    const size_t L = s.length();
    vector<bool> visited(L, false);
    queue<int> q;
    q.push(0);

    while (!q.empty()) {
        int front = q.front();
        q.pop();

        for (int j = 1; j < L - front + 1; j++) {
            string temp = s.substr(front, j);
            int index = front + j;

            if (set.find(temp) != set.end() && !visited[index - 1]) {
                if (index == L) {
                    return true;
                }
                visited[index - 1] = true;
                q.push(index);
            }
        }
    }

    return false;
}

Testing

int main()
{
    vector<string>  };
    cout << boolalpha << wordBreak("nightymare", words) << endl;

    words = { "leet", "code" };
    cout << boolalpha << wordBreak("leetcode", words) << endl;
}

Python

from queue import Queue


class Solution(object):
    def wordBreak(self, s, wordDict):
        """
        :type s: str
        :type wordDict: List[str]
        :rtype: bool
        """

        options = set(wordDict)
        target = len(s)

        q = Queue()
        q.put(0)
        visited = set([0])

        while not q.empty():
            front = q.get()

            # search through all substrings from front
            for j in range(1, target - front + 1):
                if s[front:front+j] in options:
                    if front + j == target:
                        return True
                    elif front + j not in visited:
                        visited.add(front+j)
                        q.put(front+j)
        return False

Last updated