python-3.x - Python 中文本 block 开头的全部大写的正则表达式

标签 python-3.x regex

我很难组合一个正则表达式模式,将段落开头的所有大写单词与段落的其余部分分开。

    text_example = 
      """HERE IS SOME TEXT, AND HERE IS SOME MORE; AND ANOTHER START. There is a lot of text 
      that I am also interested in extracting and that will have a variety of Information, 
      symbols @#$^*&^ and even amounts such as $4,123,156 to be included as well."""

假设我有上面的文字。我想捕获这里是一些文本,这里还有更多;另一个开始。 - 段落开头全部大写单词和标点符号的模式 - 其余的有很多我也感兴趣的文本... -这是在新句子或名称等开头带有符号、数字和一些大写字母的文本。

我一直在研究以下模式,但它不太正确

pattern = re.compile(r"([A-Z]+\s?[A-Z]+[^a-z0-9])(.*)")
passage_start = re.search(pattern, text).group(1)
passage_remaining = re.search(pattern, text).group(2)

print(passage_start)
print()
print(passage_remaining)

运行此程序时,我得到:

HERE IS

SOME TEXT, AND HERE IS SOME MORE; AND ANOTHER START. There is a lot of text that I 
am also interested in extracting and that will have a variety of Information, symbols 
@#$^*&^ and even amounts such as $4,123,156 to be included as well.

希望得到一些帮助!谢谢

最佳答案

你可以使用

^([^a-z]+)\b(.*)

请参阅regex demo详细信息:

  • ^ - 字符串的开头
  • ([^a-z]+) - 第 1 组:除小写 ASCII 字母之外的任意零个或多个字符
  • \b - 单词边界
  • (.*) - 第 2 组:任何零个或多个字符,尽可能多。

查看Python demo :

import re
rx = r"^([^a-z]+)\b(.*)"
text = "HERE IS SOME TEXT, AND HERE IS SOME MORE; AND ANOTHER START. There is a lot of text that I am also interested in extracting and that will have a variety of Information, symbols @#$^*&^ and even amounts such as $4,123,156 to be included as well."
m = re.search(rx, text, re.DOTALL)
if m:
    print(m.group(1)) # HERE IS SOME TEXT, AND HERE IS SOME MORE; AND ANOTHER START. 
    print(m.group(2)) # There is a lot of text that I am also interested in extracting and that will have a variety of Information, symbols @#$^*&^ and even amounts such as $4,123,156 to be included as well.

关于python-3.x - Python 中文本 block 开头的全部大写的正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66566400/

相关文章:

用于匹配字符串选项的 Python 正则表达式

PHP:如何从字符串转储中提取 JSON 字符串

Java 正则表达式引擎崩溃

python - 构建 Boost Python 调试

android - 如何在 ListView 上使用 ScrollEffect 来防止过度滚动?

python - 升级到 GAE3 时替换 google.appengine.api 导入模块

python - 动态设置 tkinter Spinbox 的范围

regex - 如何让我的 Perl one-liner 仅显示文件中的第一个正则表达式匹配项?

javascript - 正则表达式匹配逗号分隔列表中的单词

python - 语法错误: Python keyword not valid identifier in numexpr query