python - 正则表达式贪心问题

我需要使用 Python 解析字符串并提取由 :(冒号)分隔的 2 个标记，这些标记可以用单引号、双引号或不带引号括起来。

工作示例:

# <input string> -> <tuple that should return> 

1) abc:def -> (abc, def)
2) abc:"def" -> (abc, def)
3) "abc":def -> (abc, def)
4) "abc":"def" -> (abc, def)
5) "a:bc":abc -> (a:bc, abc)

示例案例不起作用:

# <input string> -> <tuple that should return> 

6) abc:"a:bc" -> (abc, a:bc)
7) "abcdef" -> (abcdef,)

使用的正则表达式是:

>>> import re
>>> rex = re.compile(r"(?P<fquote>[\'\"]?)"
                     r"(?P<user>.+)"
                     r"(?P=fquote)"
                     r"(?:\:"
                     r"(?P<squote>[\'\"]?)"
                     r"(?P<pass>.+)"
                     r"(?P=squote))")

我有两个问题，首先是示例案例 6) 和 7) 不起作用，第二个是在 rex.match 之后，我希望所有组都匹配，但不希望 fquote 和 squote 的。我的意思是现在 rex.match("'abc':'def').groups() 返回 ("'", "abc", "'", "def")，我只想要 ("abc", "def")。

有什么想法吗？

谢谢

最佳答案

您可以在此处使用 csv 模块而不是正则表达式:

inputs = [
    'abc:def', 'abc:"def"', '"abc":def', '"abc":"def"', '"a:bc":abc', #working
    'abc:"a:bc"', 'abcdef' # not working

]

import csv
for idx, el in enumerate(inputs, start=1):
    print idx, tuple(next(csv.reader([el], delimiter=':')))

这给你:

1 ('abc', 'def')
2 ('abc', 'def')
3 ('abc', 'def')
4 ('abc', 'def')
5 ('a:bc', 'abc')
6 ('abc', 'a:bc')
7 ('abcdef',)

关于python - 正则表达式贪心问题，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/15343228/

上一篇：python 扭曲的 HTTP 代理

下一篇：python - ( "SELECT password FROM peerlist WHERE username=?"，用户名1)

相关文章：

Python 列表理解太慢

Python面向对象编程: Automatically changing instances of a class after changing a class attribute?

python - 按索引对列表中的列表元素进行分组

python - 为什么这个 python 正则表达式不能编译？

java - 使用正则表达式验证语句

python - 显示在条形图中绘制的 y 轴值水平线

python - matplotlib 在单个 pdf 页面中显示许多图像

java - 用于匹配包含 <n> 个字符的字符串的简单正则表达式

javascript - 管道文件时如何使javascript正则表达式匹配所有行

regex - 正则表达式搜索并替换YouTube短代码