python - 检查是否可以使用提供的列表中的单词将字符串拆分为句子

我最近偶然发现了一项编码任务，而且我一直在努力把它做好。它是这样的:

给定一个非空字符串 s 和一个包含非空单词列表的列表 word_list，确定 s可以被分割成一个或多个字典单词的空格分隔序列。您可能会假设 word_list 不包含重复项，但每个单词都可以使用多次。

例如，给定:

s = 'whataniceday'
word_list = ['a', 'what', 'an', 'nice', 'day']

返回True，因为'whataniceday'可以被分割为'what a nice day'。

我想出了一个非常天真的解决方案，它适用于这个特定的例子，但要让它失败并不难，例如将一个词添加到 word_list 列表中的另一个词以(即 ['a', 'wha', 'what', 'an', 'nice', 'day'])开头。还有很多其他事情可能会弄乱我的解决方案，但无论如何:

s = "whataniceday"
word_list = ["h", "a", "what", "an", "nice", "day"]

def can_be_segmented(s, word_list):
    tested_str = s
    buildup_str = ''

    for letter in tested_str:        
        buildup_str += letter

        if buildup_str not in word_list:
            continue

        tested_str = tested_str[len(buildup_str):]
        buildup_str = ''

    return bool(tested_str == '' and buildup_str == '')

print(can_be_segmented(s, word_list))

你们知道如何解决这个问题吗？或者也许有更好的方法来解决这个问题？

最佳答案

>>> import re
>>> s = 'whataniceday'
>>> word_list = ['a', 'what', 'an', 'nice', 'day']
>>> re.match('^(' + '|'.join(f'({s})' for s in word_list) + ')*$', s)
<_sre.SRE_Match object; span=(0, 12), match='whataniceday'>

作为函数:

import re
def can_be_segmented(s, word_list):
    pattern = re.compile('^(' + '|'.join(f'({s})' for s in word_list) + ')*$')
    return pattern.match(s) is not None

使组成为非捕获((?:word) 而不是 (word) 可能是一种优化，以便 re.match 不必跟踪匹配的单词，但我不会计时。

如果您的单词不仅仅是字母，您可能希望通过 re.escape() 传递它们(如 f'({re.escape(s)})' 而不是 f'({s})')。

如果您要混合大小写并且希望它们匹配，请传递 re.IGNORECASE 或 re.I 标志(如 pattern. match(s, re.I) 而不是 pattern.match(s))。

参见 re documentation了解更多。

关于python - 检查是否可以使用提供的列表中的单词将字符串拆分为句子，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/51163397/

python - 检查是否可以使用提供的列表中的单词将字符串拆分为句子

上一篇：python - 在单元格中的第一个字母之后拆分 Pandas 数据框列(一分为二)

下一篇：python - 从 numpy 中的一维向量映射二维延迟向量