python - 在 python 中使用正则表达式捕获所有连续的全大写单词？

我正在尝试在 Python 中使用正则表达式匹配所有连续的所有大写单词/短语。鉴于以下情况:

    text = "The following words are ALL CAPS. The following word is in CAPS."

代码将返回:

    ALL CAPS, CAPS

我目前正在使用:

    matches = re.findall('[A-Z\s]+', text, re.DOTALL)

但这会返回:

    ['T', ' ', ' ', ' ', ' ALL CAPS', ' T', ' ', ' ', ' ', ' ', ' CAPS']

我显然不想要标点符号或“T”。我只想返回连续的单词或只包含所有大写字母的单个单词。

谢谢

最佳答案

这个完成了工作:

import re
text = "tHE following words aRe aLL CaPS. ThE following word Is in CAPS."
matches = re.findall(r"(\b(?:[A-Z]+[a-z]?[A-Z]*|[A-Z]*[a-z]?[A-Z]+)\b(?:\s+(?:[A-Z]+[a-z]?[A-Z]*|[A-Z]*[a-z]?[A-Z]+)\b)*)",text)
print matches

输出:

['tHE', 'aLL CaPS', 'ThE', 'Is', 'CAPS']

解释:

(           : start group 1
  \b        : word boundary
  (?:       : start non capture group
    [A-Z]+  : 1 or more capitals
    [a-z]?  : 0 or 1 small letter
    [A-Z]*  : 0 or more capitals
   |        : OR
    [A-Z]*  : 0 or more capitals
    [a-z]?  : 0 or 1 small letter
    [A-Z]+  : 1 or more capitals
  )         : end group
  \b        : word boundary
  (?:       : non capture group
    \s+     : 1 or more spaces
    (?:[A-Z]+[a-z]?[A-Z]*|[A-Z]*[a-z]?[A-Z]+) : same as above
    \b      : word boundary
  )*        : 0 or more time the non capture group
)           : end group 1

关于python - 在 python 中使用正则表达式捕获所有连续的全大写单词？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/43523189/

上一篇：python - Keras 没有使用 100% cpu

下一篇：python - tf.contrib.metrics.streaming_auc 中 update_op 返回值的用途是什么

regex - 需要 RE 来仅拾取行尾之前的大写单词集

regex - 如何在oracle中提取字符串的特定部分？

python - 完成 h2o 操作后删除进度条

python - Emacs 23 和 iPython

python - 社区版有 IntelliJ Python 插件吗？

php - 正则表达式可以更快地做到这一点吗？

python - zip dict.items() 列表中的多个字典？

Python Seaborn 不显示所有数据

javascript - regEx忽略项目顶层的build和dist目录？