python-3.x - Python 中文本 block 开头的全部大写的正则表达式

我很难组合一个正则表达式模式，将段落开头的所有大写单词与段落的其余部分分开。

    text_example = 
      """HERE IS SOME TEXT, AND HERE IS SOME MORE; AND ANOTHER START. There is a lot of text 
      that I am also interested in extracting and that will have a variety of Information, 
      symbols @#$^*&^ and even amounts such as $4,123,156 to be included as well."""

假设我有上面的文字。我想捕获这里是一些文本，这里还有更多；另一个开始。 - 段落开头全部大写单词和标点符号的模式 - 其余的有很多我也感兴趣的文本... -这是在新句子或名称等开头带有符号、数字和一些大写字母的文本。

我一直在研究以下模式，但它不太正确

pattern = re.compile(r"([A-Z]+\s?[A-Z]+[^a-z0-9])(.*)")
passage_start = re.search(pattern, text).group(1)
passage_remaining = re.search(pattern, text).group(2)

print(passage_start)
print()
print(passage_remaining)

运行此程序时，我得到:

HERE IS

SOME TEXT, AND HERE IS SOME MORE; AND ANOTHER START. There is a lot of text that I 
am also interested in extracting and that will have a variety of Information, symbols 
@#$^*&^ and even amounts such as $4,123,156 to be included as well.

希望得到一些帮助!谢谢

最佳答案

你可以使用

^([^a-z]+)\b(.*)

请参阅regex demo 。 详细信息:

^ - 字符串的开头
([^a-z]+) - 第 1 组:除小写 ASCII 字母之外的任意零个或多个字符
\b - 单词边界
(.*) - 第 2 组:任何零个或多个字符，尽可能多。

查看Python demo :

import re
rx = r"^([^a-z]+)\b(.*)"
text = "HERE IS SOME TEXT, AND HERE IS SOME MORE; AND ANOTHER START. There is a lot of text that I am also interested in extracting and that will have a variety of Information, symbols @#$^*&^ and even amounts such as $4,123,156 to be included as well."
m = re.search(rx, text, re.DOTALL)
if m:
    print(m.group(1)) # HERE IS SOME TEXT, AND HERE IS SOME MORE; AND ANOTHER START. 
    print(m.group(2)) # There is a lot of text that I am also interested in extracting and that will have a variety of Information, symbols @#$^*&^ and even amounts such as $4,123,156 to be included as well.

关于python-3.x - Python 中文本 block 开头的全部大写的正则表达式，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/66566400/

python-3.x - Python 中文本 block 开头的全部大写的正则表达式

上一篇：javascript - 虚线移动线 react 三纤维

下一篇：java - Spring Data Jpa获取没有实体表示的数据