python - Python 字节正则表达式中的 $ Windows 换行符

$ 匹配行尾，行尾定义为字符串的结尾，或后跟换行符的任何位置。

但是Windows换行标志中包含两个字符'\r\n'，如何让'$'识别'\r\n' 作为 bytes 中的换行符？

这是我的:

# Python 3.4.2
import re

input = b'''
//today is a good day \r\n
//this is Windows newline style \r\n
//unix line style \n
...other binary data... 
'''

L = re.findall(rb'//.*?$', input, flags = re.DOTALL | re.MULTILINE)
for item in L : print(item)

现在的输出是:

b'//today is a good day \r'
b'//this is Windows newline style \r'
b'//unix line style '

但预期的输出如下:

the expected output:
b'//today is a good day '
b'//this is Windows newline style '
b'//unix line style '

最佳答案

无法重新定义 anchor 行为。

要将 // 与其后除 CR 和 LF 之外的任意数量的字符匹配，请使用否定字符类 [^\r\n] 和 * 量词:

L = re.findall(rb'//[^\r\n]*', input)

请注意，此方法不需要使用 re.M 和 re.S 标志。

或者，您可以在 $ 之前添加 \r? 并将这部分包含在正向预测中(另外，您将成为 *? 惰性量词与 .):

rb'//.*?(?=\r?$)'

使用前瞻的要点是 $ 本身就是一种前瞻，因为它并不真正使用 \n 字符。因此，我们可以使用可选的 \r 安全地将其放入前瞻中。

也许这不是那么相关，因为它来自 MSDN ，但我认为 Python 也是一样的:

Note that $ matches \n but does not match \r\n (the combination of carriage return and newline characters, or CR/LF). To match the CR/LF character combination, include \r?$ in the regular expression pattern.

在 PCRE 中，您可以使用 (*ANYCRLF), (*CR) and (*ANY)覆盖 $ anchor 的默认行为，但不是在 Python 中。

关于python - Python 字节正则表达式中的 $ Windows 换行符，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31399999/

python - Python 字节正则表达式中的 $ Windows 换行符

上一篇：python - 如何在删除重复单词并在 Python 中对列表进行排序的同时将文本文件转换为列表？

下一篇：python - 在 init 中重新分配 self