我有一个单词列表
我正在根据这个单词列表创建一个正则表达式对象列表
import re
word = 'This is word of spy++'
wl = ['spy++','cry','fpp']
regobjs = [re.compile(r"\b%s\b" % word.lower() ) for word in wl]
for reobj in regobjs:
print re.search(regobj, word).group()
但是由于符号++,我在创建正则表达式 objs 时遇到错误(错误:多次重复)
我如何使正则表达式处理单词列表中单词的所有情况?
requirements:
regex should detect the exact word from the given text
even if the word having non alpha numeric chars like (++) above code detect the exact words except those having ++ char.
最佳答案
除了 re.escape()
之外,您还需要删除非字母数字字符前后的 \b
单词边界,否则匹配将失败。
像这样的东西(不是很优雅,但我希望它能说明问题):
import re
words = 'This is word of spy++'
wl = ['spy++','cry','fpp']
regobjs = []
for word in wl:
eword = re.escape(word.lower())
if eword[0].isalnum() or eword[0]=="_":
eword = r"\b" + eword
if eword[-1].isalnum() or eword[-1]=="_":
eword = eword + r"\b"
regobjs.append(re.compile(eword))
for regobj in regobjs:
print re.search(regobj, words).group()
关于python - 处理 '++' 登录 python 正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/8295774/