python - 使用非贪婪正则表达式捕获文本部分

标签 python regex python-2.7

使用 re.findall,我想提取分配给每个 PCR 的值。

>>> z
'PCR-09: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-11: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-12: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-13: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-14: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-15: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\nPCR-16: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 \r\n

>>> print z
PCR-09: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
PCR-10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
PCR-11: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
PCR-12: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
PCR-13: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
PCR-14: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
PCR-15: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
PCR-16: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

最初,我试过这个,但有人能指出使用的正则表达式有什么问题吗?

>>> re.search('PCR-09:(.*?)', z).groups()
('',)

非贪婪表达式 (.*?) 是否应该匹配所有字符直到找到换行符?

通过稍微修改正则表达式,我得到了想要的结果:

>>> re.search('PCR-09:(.*?)\s\r\n', z).groups()
(' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00',)

在同一条线上,这是行不通的:

>>> re.findall(r'(PCR-\d+):(.*?)', z)
[('PCR-09', ''), ('PCR-10', ''), ('PCR-11', ''), ('PCR-12', ''), ('PCR-13', ''), ('PCR-14', ''), ('PCR-15', ''), ('PCR-16', ''), 

但是这样做:

>>> re.findall(r'(PCR-\d+):(.*?)\s\r\n', z,re.DOTALL)
[('PCR-09', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-10', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-11', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-12', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-13', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-14', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-15', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'), ('PCR-16', ' 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00'),

希望有人能解释我的方法有什么问题。

谢谢

最佳答案

r'PCR-09:(.*?)' 与您的预期不符的原因是非贪婪正则表达式在有效时立即停止。

所以 (.*?) 可以匹配 '',所以正则表达式立即停止。

相比之下,r'(PCR-\d+):(.*?)\s\r\n'是非贪婪的,但是因为它需要找到`\s\r\n',它将强制展开工作。

我建议使用只包含您希望找到的字符的贪婪正则表达式:r'(PCR-\d+):([0-9 ]*)'

关于python - 使用非贪婪正则表达式捕获文本部分,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/26807554/

相关文章:

python - 临时文件目录最终是否被系统清除

python - 有没有办法在 Azure Functions 中延迟 QueueMessage?

python - 如何按顺序查找和替换偶数位置元素的值

python - 将文件的内容放入列表中?

python - 使用 Telegram - API 或 CLI 发送消息?

python - 开始使用 Python 进行安全的 AWS CloudFront 流传输

javascript - 如何使用正则表达式匹配重复子字符串?

MySQL 正则表达式 + 空格 (\s)

正则表达式:匹配文本方括号和方括号内包含的内容,然后搜索并替换

python - 参数为函数的 lambda 化函数的 Lambda