Python re.match 只匹配第一个\n

我正在尝试使用 subprocess 和 re 用 Python (3.7.4) 包装 ping。

subprocess 函数的 stdout 是字节数组，因此我必须更改正则表达式类型以匹配大小写。

    import subprocess,re

    out = subprocess.run(['ping', '-c', '1', '8.8.8.8'], capture_output=True)
    print(out.stdout)
    match = re.match(br'P(..)G', out.stdout, re.DOTALL | re.MULTILINE)
    if match:
        print(match.groups())

    match = re.match(br'trans(.)', out.stdout, re.DOTALL | re.MULTILINE)
    if match:
        print(match.groups())

ping 命令的实际输出:

b'PING 8.8.8.8 (8.8.8.8) 56(84) 字节数据。\n64 字节来自 8.8.8.8:icmp_seq=1 ttl=53 time=60.7 ms\n\n--- 8.8.8.8 ping统计---\n1个数据包传输，1个接收，0%丢包，时间0ms\nrtt min/avg/max/mdev = 60.665/60.665/60.665/0.000 ms\n'

match.groups 的第一个输出:

(b'IN',)

第二个是空的(应该是(b'm',))，实际上第一个\n之后的都匹配不到。

注意我有 re.MULTILINE，使用 str() 或 .decode() 转换为 str对输出没有任何影响。

检查了几种不同的在线工具，它们都有效，有什么想法吗？

最佳答案

当你使用 match 从第一个位置开始匹配时，你的变量不是以 trans 开头的，这就是它没有匹配它的原因，使用 .*?trans(.) 表示 trans 在文本中间，但我认为你应该使用搜索:

   match = re.search(br'trans(.)', out.stdout)

注意:

re.DOTALL 仅在您要在 . 中包含 \n 时使用，这意味着 .将匹配任何字符，包括 \n。
re.MULTILINE 默认 ^ 匹配文本的开头，$ 匹配文本的结尾，但是当你编译你的REGEX 带有这个标志， ^ 将匹配行首和 $ 行尾 (\n)。

您遇到的问题是匹配工作检查此示例的方式:

import re

pattern = r'HELLO (\w+)'

print(re.match(pattern, 'HELLO X').groups())  # work fine because the text start with HELLO 
m = re.match(pattern, 'CHELLO X')
print(m is None)  # didn't mach because the Text didn't start with HELLO

当你没有指定 Hello 前面有一些字符时，匹配从第一个位置开始匹配。

解释DOTALL:

import re

text = '\nHELLO X'
pattern = re.compile(r'.*?HELLO (\w+)')
pattern_dotall = re.compile(r'.*?HELLO (\w+)', re.DOTALL)

print(re.match(pattern, text) is None)  # True: . don't match \n
print(re.match(pattern_dotall, text) is None)  # False: here is included

关于Python re.match 只匹配第一个\n，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/58269802/

Python re.match 只匹配第一个\n

上一篇：python - 如何在 Python 中编写自己的异步/等待协程函数？

下一篇：python pip 给出 : "no such option: -r"