python - 如何查找固定字符串周围的匹配项

标签 python string

我正在寻求帮助,以查找允许我获取字符串列表的 Python 函数,例如 ["I like ", " and ", " because "]和单个目标字符串,例如 "I like lettuce and carrots and onions because I do" ,并查找目标字符串中字符的所有分组方式,以使列表中的每个字符串按顺序排列。

例如:

solution(["I like ", " and ", " because ", "do"],
         "I like lettuce and carrots and onions because I do")

应该返回:

[("I like ", "lettuce", " and ", "carrots and onions", " because ", "I ", "do"), 
 ("I like ", "lettuce and carrots", " and ", "onions", " because ", "I ", "do")]

请注意,在每个元组中,列表参数中的字符串按顺序排列,并且该函数返回分割目标字符串的每种可能方法以实现此目的。

另一个例子,这次只有一种可能的字符组织方式:

solution(["take ", " to the park"], "take Alice to the park")

应该给出结果:

[("take ", "Alice", " to the park")]

下面是一个无法正确组织字符的示例:

solution(["I like ", " because ", ""],
         "I don't like cheese because I'm lactose-intolerant")

应该回馈:

[]

因为没有办法做到。请注意 "I like "第一个参数中不能拆分。目标字符串没有字符串 "I like "在其中,所以不可能匹配。

这是最后一个示例,同样具有多个选项:

solution(["I", "want", "or", "done"],
         "I want my sandwich or I want my pizza or salad done")

应该返回

[("I", " ", "want", " my sandwich ", "or", " I want my pizza or salad ", "done"),
 ("I", " ", "want", " my sandwich or I want my pizza ", "or", " salad ", "done"),
 ("I", " want my sandwich or I", "want", " my pizza ", "or", " salad ", "done")]`

请再次注意,每个字符串 ["I", "want", "or", "done"]按顺序包含在每个元组中,并且其余字符以任何可能的方式围绕这些字符串重新排序。返回的是所有可能的重新排序的列表。

请注意,还假设列表中的第一个字符串将出现在目标字符串的开头,列表中的最后一个字符串将出现在目标字符串的末尾。 (如果不这样做,该函数应该返回一个空列表。)

哪些 Python 函数可以让我做到这一点?

我尝试过使用正则表达式函数,但在有多个选项的情况下似乎会失败。

最佳答案

我有一个解决方案,它需要大量重构,但似乎有效, 我希望这会有所帮助,这是一个非常有趣的问题。

import itertools
import re
from collections import deque


def solution(search_words, search_string):
    found = deque()
    for search_word in search_words:
        found.append([(m.start()) for m in re.compile(search_word).finditer(search_string)])
    if len(found) != len(search_words) or len(found) == 0:
        return []  # no search words or not all words found
    word_positions_lst = [list(i) for i in itertools.product(*found) if sorted(list(i)) == list(i)]

    ret_lst = []
    for word_positions in word_positions_lst:
        split_positions = list(itertools.chain.from_iterable(
            (split_position, split_position + len(search_word))
            for split_position, search_word in zip(word_positions, search_words)))
        last_seach_word = search_string[split_positions[-1]:]
        ret_strs = [search_string[a:b] for a, b in zip(split_positions, split_positions[1:])]
        if last_seach_word:
            ret_strs.append(last_seach_word)
        if len(search_string) == sum(map(len,ret_strs)):
            ret_lst.append(tuple(ret_strs))
    return ret_lst


print(solution(["I like ", " and ", " because ", "do"],
               "I like lettuce and carrots and onions because I do"))
print([("I like ", "lettuce", " and ", "carrots and onions", " because ", "I ", "do"),
       ("I like ", "lettuce and carrots", " and ", "onions", " because ", "I ", "do")])
print()

print(solution(["take ", " to the park"], "take Alice to the park"))
print([("take ", "Alice", " to the park")])
print()

print(solution(["I like ", " because "],
               "I don't like cheese because I'm lactose-intolerant"))
print([])
print()

输出:

[('I like ', 'lettuce', ' and ', 'carrots and onions', ' because ', 'I ', 'do'), ('I like ', 'lettuce and carrots', ' and ', 'onions', ' because ', 'I ', 'do')]
[('I like ', 'lettuce', ' and ', 'carrots and onions', ' because ', 'I ', 'do'), ('I like ', 'lettuce and carrots', ' and ', 'onions', ' because ', 'I ', 'do')]

[('take ', 'Alice', ' to the park')]
[('take ', 'Alice', ' to the park')]

[]
[]

[('I', ' ', 'want', ' my sandwich ', 'or', ' I want my pizza or salad ', 'done'), ('I', ' ', 'want', ' my sandwich or I want my pizza ', 'or', ' salad ', 'done'), ('I', ' want my sandwich or I ', 'want', ' my pizza ', 'or', ' salad ', 'done')]
[('I', ' ', 'want', ' my sandwich ', 'or', ' I want my pizza or salad ', 'done'), ('I', ' ', 'want', ' my sandwich or I want my pizza ', 'or', ' salad ', 'done'), ('I', ' want my sandwich or I', 'want', ' my pizza ', 'or', ' salad ', 'done')]

编辑:重构代码以具有有意义的变量名称。

Edit2:添加了我忘记的最后一个案例。

关于python - 如何查找固定字符串周围的匹配项,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/50342462/

相关文章:

python - 多次测量拟合曲线

java - input.nextLine() 在 while 循环内带有一个字符串

java - 一个字符的字符串可以转换为字符吗?

string - 如何使用另一个 HashSet<String> 扩展 HashSet<String>?

ruby - 用 Ruby 将字符串中的元音大写

python - 无法让 Counter() 在 python 中工作

python - 找不到从 PIP virtualenv 安装 gettext 的方法

python - 按元组对数组进行切片

string - Solr:方面计数索引不能是字符串?

python - python matplotlib中基于颜色条的图例