python - 正则表达式 (regex) 从 Python 中的字符串中删除单词 "and"、非字母数字字符和空格

在 Python 中，我正在尝试清理(然后比较)艺术家姓名并想删除:

非字母字符，或
空格，或
“和”这个词

输入字符串:Bootsy Collins 和橡皮筋

期望的输出:BootsyCollinsTheRubberBand

import re

s = 'Bootsy Collins and The Rubber Band'
res1 = re.sub(r'[^\w]|\s|\s+(and)\s', "", s)
res2 = re.sub(r'[^\w]|\s|\sand\s', "", s)
res3 = re.sub(r'[^\w]|\s|(and)', "", s)

print("\b", s, "\n"
      , "1st: ", res1, "\n"
      , "2nd: ", res2, "\n"
      , "3rd: ", res3)

Output:
Bootsy Collins and The Rubber Band 
 1st:  BootsyCollinsandTheRubberBand 
 2nd:  BootsyCollinsandTheRubberBand 
 3rd:  BootsyCollinsTheRubberB

最佳答案

为了支持您设置的规则，而不仅仅是引用的示例文本，您需要一个更通用的正则表达式，并为 re.sub 调用设置正确的标志:

re.sub(r'\band\b|\W', '', s, flags=re.IGNORECASE)

解释

设置标志 re.IGNORECASE 以便您还可以删除句子中的“And”(以及其他大写/小写组合变体)。如果您只想删除“and”而不是它的任何变体，您可以删除此标志设置。
\band\b 两边用单词边界标记 \b 括起来的单词“and”。这是为了将 3 个字符序列“and”作为一个独立的词进行匹配，而不是作为另一个词的子串。使用 \b 来隔离单词，而不是像 \s+and\s 那样将单词括在空格内，其优点是 \b选项还可以检测字符串中的单词边界，例如 and, 而 \s+and\s 则不能。这是因为逗号不是空格。
因为空格 \s 也是一种非单词 \W(因为单词 \w 等同于 [a-zA-Z0-9_])，两者都不需要单独的正则表达式标记。 \W 已经包含 \s。因此，您可以简化正则表达式而无需单独使用 \s。

演示

测试用例 #1:

s = 'Bootsy Collins and The Rubber Band'
res = re.sub(r'\band\b|\W', '', s, flags=re.IGNORECASE)
print(res)

Output:
'BootsyCollinsTheRubberBand'

测试用例 #2('And' 被删除):

s = 'Bootsy Collins And The Rubber Band'
res = re.sub(r'\band\b|\W', '', s, flags=re.IGNORECASE)
print(res)

Output:
'BootsyCollinsTheRubberBand'

测试用例 #3('and' [with comma after 'and'] 被移除)

s = 'Bootsy Collins and, The Rubber Band'
res = re.sub(r'\band\b|\W', '', s, flags=re.IGNORECASE)
print(res)

Output:
'BootsyCollinsTheRubberBand'

计数器测试用例:(正则表达式使用空格 \s+ 或 \s 而不是 \b 为单词边界)

s = 'Bootsy Collins and, The Rubber Band'
res = re.sub(r'\s+(and)\s|\W', '',s)
print(res)

Output:   'and' is NOT removed
'BootsyCollinsandTheRubberBand'

关于python - 正则表达式 (regex) 从 Python 中的字符串中删除单词 "and"、非字母数字字符和空格，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66517097/

python - 正则表达式 (regex) 从 Python 中的字符串中删除单词 "and"、非字母数字字符和空格

解释

演示

上一篇：python-xarray - 将 grib2 文件转换为 csv

下一篇：laravel - 我如何设置 Vue 3 + Laravel 8