我尝试使用单词边界在 python 正则表达式中分离 m,并找到它们。这些 m 应该在两边都有一个空格或开始/结束字符串:
r = re.compile("\\bm\\b")
re.findall(r, someString)
但是,由于撇号被认为是单词边界,因此此方法还可以在 I'm
等单词中找到 m。如何编写不将撇号视为单词边界的正则表达式?
我已经试过了:
r = re.compile("(\\sm\\s) | (^m) | (m$)")
re.findall(r, someString)
但这与任何 m 都不匹配。奇怪。
最佳答案
使用环视断言:
>>> import re
>>> re.findall(r'(?<=\s)m(?=\s)|^m|m$', "I'm a boy")
[]
>>> re.findall(r'(?<=\s)m(?=\s)|^m|m$', "I m a boy")
['m']
>>> re.findall(r'(?<=\s)m(?=\s)|^m|m$', "mama")
['m']
>>> re.findall(r'(?<=\s)m(?=\s)|^m|m$', "pm")
['m']
(?=...)
Matches if
...
matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example, Isaac(?=Asimov)
will match'Isaac '
only if it’s followed by'Asimov'
.
(?<=...)
Matches if the current position in the string is preceded by a match for
...
that ends at the current position. This is called a positive lookbehind assertion.(?<=abc)def
will find a match inabcdef
, ...
顺便说一句,使用原始字符串( r'this is raw string'
),您不需要转义 \
.
>>> r'\s' == '\\s'
True
关于python - 在考虑带有撇号的单词时,如何在 python 中使用正则表达式分隔单词?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/19331391/