python - 使用正则表达式捕获撇号

我正在使用Python的re模块来捕获Am中单词color的所有修饰符。英语 (AmE) 和 Br。英语(BrE)。我成功地捕获了几乎所有单词，除了以撇号结尾的单词。例如。 colors' 这个问题来自 Watt 的 Beginning Reg Exp 书。

这里是示例文本:

Red is a color.
His collar is too tight or too colouuuurful.
These are bright colours.
These are bright colors.
Calorific is a scientific term.
“Your life is very colorful,” she said.
color (U.S. English, singular noun)
colour (British English, singular noun)
colors (U.S. English, plural noun)
colours (British English, plural noun)
color’s (U.S. English, possessive singular)
colour’s (British English, possessive singular)
colors’ (U.S. English, possessive plural)
colours’ (British English, possessive plural)

这是我的正则表达式:\bcolou?r(?:[a-zA-Z’s]+)?\b

说明:

\b                 # Start at word boundary
colou?r            #u is optional for AmE
    (?:            #non-capturing group
    [a-zA-Z’s]+    #color could be followed by modifier (e.g.ful, or apostrophe)
    )?             #End non-capturing group; these letters are optional
\b                 # End at word boundary

问题是 colors' 和 colours' 一直匹配到 s。撇号被忽略。有人可以解释一下我的代码有什么问题吗？我在 SO Regex Apostrophe how to match? 上对此进行了研究，以及转义 ' 和 " 的问题。

这是Regex101

提前致谢。

最佳答案

问题是 \b 是一个单词边界，而对于 ...lors' 来说，' 和后面的空格不是单词边界，因为'和空格都不是单词字符。使用前视来代替 \b，而使用空格、句点、逗号或后面可能出现的任何其他内容:

\bcolou?r(?:[a-zA-Z’s]+)?(?=[ .,])

https://regex101.com/r/lB49Nr/3

关于python - 使用正则表达式捕获撇号，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/52685872/

python - 使用正则表达式捕获撇号

上一篇：python - 如何在 DRF 中使字段可编辑=False

下一篇：python - 如何测试 Python 中唯一项目的多个集合？