python - 删除一行中特定单词的重复

例如我有一个字符串:

my_str = 'my example example string contains example some text'

我想做的 - 删除特定单词的所有重复项(仅当它们连续出现时)。结果:

my example string contains example some text

我尝试了下一个代码:

import re
my_str = re.sub(' example +', ' example ', my_str)

或

my_str = re.sub('\[ example ]+', ' example ', my_str)

但它不起作用。我知道有很多关于 re 的问题，但我仍然无法将它们正确地应用到我的案例中。

最佳答案

您需要创建一个组并对其进行量化:

import re
my_str = 'my example example string contains example some text'
my_str = re.sub(r'\b(example)(?:\s+\1)+\b', r'\1', my_str)
print(my_str) # => my example string contains example some text

# To build the pattern dynamically, if your word is not static
word = "example"
my_str = re.sub(r'(?<!\w)({})(?:\s+\1)+(?!\w)'.format(re.escape(word)), r'\1', my_str)

参见 Python demo

我添加了单词边界，因为 - 根据原始代码中的空格判断 - 预计整个单词匹配。

参见 regex demo here :

\b - 单词边界(替换为 (?<!\w) - 在当前位置之前没有单词字符是允许的 - 在动态方法中，因为 re.escape 也可能支持像 .word. 这样的“单词”，然后 \b 可能会阻止正则表达式匹配)
(example) - 第 1 组(从替换模式中引用 \1): example单词
(?:\s+\1)+ - 出现 1 次或多次
- \s+ - 1+ 个空格
- \1 - 对第 1 组值的反向引用，即 example单词
\b - 单词边界(替换为 (?!\w) - 当前位置后不允许有单词字符)。

请记住，在 Python 2.x 中，您需要使用 re.U如果你需要制作\b词边界 Unicode 感知。

关于python - 删除一行中特定单词的重复，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/48827207/

python - 删除一行中特定单词的重复

上一篇：python - 接受 int 或 int 元组作为 python 函数参数

下一篇：Python 附加数据框，以便只有列保持不变