我想要识别句子中的单词,但如果它以字母数字字符开头则不需要。如果以 1 结尾也没关系。
我所做的一个例子;
words = ["THIS", "THAT"]
sentences = ["I want to identify THIS word.", "And THAT!", "But I do not want to identify !THIS word", "Or [THIS] word"]
for sentence in sentences:
for word in words:
word_re = re.search(r"\b(%s)\b" %word, sentence)
if word_re:
print("It's a match!")
上面代码的输出将在每个句子中匹配。我想要一些只匹配前两句话的东西。 可以用正则表达式做我想做的事吗?
谢谢!
最佳答案
您可以使用正则表达式,例如
(?<!\S)(?:THIS|THAT)\b
请参阅regex demo 。 详细信息:
-
(?<!\S)
- 左侧空白边界 -
(?:THIS|THAT)
- 匹配THIS
的非捕获组或THAT
-
\b
- 单词边界。
请参阅Python demo :
import re
words = ["THIS", "THAT"]
sentences = ["I want to identify THIS word.", "And THAT!", "But I do not want to identify !THIS word", "Or [THIS] word"]
pattern = fr"(?<!\S)(?:{'|'.join(words)})\b"
for sentence in sentences:
word_re = re.search(pattern, sentence)
if word_re:
print(f"'{sentence}' is a match!")
# => 'I want to identify THIS word.' is a match!
# 'And THAT!' is a match!
如果THIS
或THAT
可以包含特殊字符,替换 pattern = fr"(?<!\S)(?:{'|'.join(words)})\b"
与 pattern = fr"(?<!\S)(?:{'|'.join(map(re.escape, words))})\b"
.
关于python - 正则表达式匹配单词,但仅当它不以非字母数字字符开头时,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/67289995/