string - 如何将给定文本分解为字典中的单词？

这是一道面试题。假设您有一个字符串 text 和一个 dictionary(一组字符串)。如何将 text 分解为子字符串，以便在 dictionary 中找到每个子字符串。

例如，您可以使用 /usr 将 "thisisatext" 分解为 ["this", "is", "a", "text"]/share/dict/words.

我相信回溯可以解决这个问题(在伪 Java 中):

void solve(String s, Set<String> dict, List<String> solution) {
   if (s.length == 0)
      return
   for each prefix of s found in dict
      solve(s without prefix, dict, solution + prefix)
}

List<String> solution = new List<String>()
solve(text, dict, solution)

有意义吗？你会优化在字典中搜索前缀的步骤吗？你会推荐什么数据结构？

最佳答案

此解决方案假设字典存在 Trie 数据结构。此外，对于 Trie 中的每个节点，承担以下功能:

node.IsWord() :如果到该节点的路径是一个单词，则返回 true
node.IsChild(char x):如果存在带有标签 x 的 child 则返回 true
node.GetChild(char x):返回标签为x的子节点

Function annotate( String str, int start, int end, int root[], TrieNode node):
i = start
while i<=end:
    if node.IsChild ( str[i]):
        node = node.GetChild( str[i] )
        if node.IsWord():
            root[i+1] = start
        i+=1
    else:
        break;

end = len(str)-1
root = [-1 for i in range(len(str)+1)]
for start= 0:end:
    if start = 0 or root[start]>=0:
        annotate(str, start, end, root, trieRoot)

index  0  1  2  3  4  5  6  7  8  9  10  11
str:   t  h  i  s  i  s  a  t  e  x  t
root: -1 -1 -1 -1  0 -1  4  6 -1  6 -1   7

我将留给您通过反向遍历根来列出组成字符串的单词的部分。

时间复杂度为 O(nk)，其中 n 是字符串的长度，k 是字典中最长单词的长度。

PS:我假设字典中有以下单词:this、is、a、text、ate。

关于string - 如何将给定文本分解为字典中的单词？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/8793387/

string - 如何将给定文本分解为字典中的单词？

上一篇：algorithm - O(NlogN) 在数组中找到 3 个具有任意 T 之和的数字

下一篇：string - 句子的词级编辑距离