我很难组合一个正则表达式模式,将段落开头的所有大写单词与段落的其余部分分开。
text_example =
"""HERE IS SOME TEXT, AND HERE IS SOME MORE; AND ANOTHER START. There is a lot of text
that I am also interested in extracting and that will have a variety of Information,
symbols @#$^*&^ and even amounts such as $4,123,156 to be included as well."""
假设我有上面的文字。我想捕获这里是一些文本,这里还有更多;另一个开始。
- 段落开头全部大写单词和标点符号的模式 - 其余的有很多我也感兴趣的文本...
-这是在新句子或名称等开头带有符号、数字和一些大写字母的文本。
我一直在研究以下模式,但它不太正确
pattern = re.compile(r"([A-Z]+\s?[A-Z]+[^a-z0-9])(.*)")
passage_start = re.search(pattern, text).group(1)
passage_remaining = re.search(pattern, text).group(2)
print(passage_start)
print()
print(passage_remaining)
运行此程序时,我得到:
HERE IS
SOME TEXT, AND HERE IS SOME MORE; AND ANOTHER START. There is a lot of text that I
am also interested in extracting and that will have a variety of Information, symbols
@#$^*&^ and even amounts such as $4,123,156 to be included as well.
希望得到一些帮助!谢谢
最佳答案
你可以使用
^([^a-z]+)\b(.*)
请参阅regex demo 。 详细信息:
^
- 字符串的开头([^a-z]+)
- 第 1 组:除小写 ASCII 字母之外的任意零个或多个字符\b
- 单词边界(.*)
- 第 2 组:任何零个或多个字符,尽可能多。
查看Python demo :
import re
rx = r"^([^a-z]+)\b(.*)"
text = "HERE IS SOME TEXT, AND HERE IS SOME MORE; AND ANOTHER START. There is a lot of text that I am also interested in extracting and that will have a variety of Information, symbols @#$^*&^ and even amounts such as $4,123,156 to be included as well."
m = re.search(rx, text, re.DOTALL)
if m:
print(m.group(1)) # HERE IS SOME TEXT, AND HERE IS SOME MORE; AND ANOTHER START.
print(m.group(2)) # There is a lot of text that I am also interested in extracting and that will have a variety of Information, symbols @#$^*&^ and even amounts such as $4,123,156 to be included as well.
关于python-3.x - Python 中文本 block 开头的全部大写的正则表达式,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/66566400/