python - 匹配一系列连字符后的所有内容

我正在 try catch 文件中行首三个连字符之后的所有剩余文本 (---)。

示例:

Anything above this first set of hyphens should not be captured.

---

This is content. It should be captured.
Any sets of three hyphens beyond this point should be ignored.

应捕获第一组三个连字符之后的所有内容。我得到的最接近的是使用这个正则表达式 [^(---)]+$ ，它的效果稍微好一些。它将捕获连字符之后的所有内容，但如果用户在该点之后放置任何连字符，它将捕获用户放置的最后一个连字符之后的内容。

我将其与 python 结合使用来捕获文本。

如果有人能帮我解决这个正则表达式问题，我将不胜感激。

最佳答案

pat = re.compile(r'(?ms)^---(.*)\Z')

(?ms) 添加了 MULTILINE 和 DOTALL 标志。

MULTILINE 标志使 ^ 匹配行的开头(而不仅仅是字符串的开头)。我们需要这个，因为 --- 出现在行的开头，但不一定是字符串的开头。

DOTALL 标志使 . 匹配任何字符，包括换行符。我们需要这个，以便 (.*) 可以匹配多行。

\Z 匹配字符串的结尾(而不是行的结尾)。

例如，

import re

text = '''\    
Anything above this first set of hyphens should not be captured.

---

This is content. It should be captured.
Any sets of three hyphens beyond this point should be ignored.
'''

pat = re.compile(r'(?ms)^---(.*)\Z')
print(re.search(pat, text).group(1))

打印

This is content. It should be captured.
Any sets of three hyphens beyond this point should be ignored.

<小时/>

请注意，当您使用括号 [...] 定义正则表达式字符类时，括号内的内容是(一般情况下，除了像 a-z 这样的连字符范围) >) 解释为单个字符。它们不是模式。因此 [---] 与 [-] 没有什么不同。事实上，[---] 是从 - 到 - 的字符范围(包含在内)。

字符类中的括号也被解释为文字括号，而不是分组分隔符。所以[(---)]相当于[-()]，字符类包括连字符和左右括号。

因此，字符类 [^(---)]+ 匹配除连字符或括号之外的任何字符:

In [23]: re.search('[^(---)]+', 'foo - bar').group()
Out[23]: 'foo '

In [24]: re.search('[^(---)]+', 'foo ( bar').group()
Out[24]: 'foo '

您可以看到这是怎么回事，以及为什么它不能解决您的问题。

关于python - 匹配一系列连字符后的所有内容，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/18413879/

python - 匹配一系列连字符后的所有内容

上一篇：python - 将随机 HEX # 颜色生成器限制为一定的颜色范围

下一篇：python - 需要 : simple way to force json to decode to "normal" iterable python lists