python - 提取单引号之间的子字符串

我是 python 的新手，正在尝试提取单引号之间的子字符串。你知道如何使用正则表达式来做到这一点吗？

电子输入

 text = "[(u'apple',), (u'banana',)]"

我想提取苹果和香蕉作为列表项，如 ['apple', 'banana']

最佳答案

在一般情况下，要提取单引号之间的任何字符，最有效的正则表达式方法是

re.findall(r"'([^']*)'", text) # to also extract empty values
re.findall(r"'([^']+)'", text) # to only extract non-empty values

参见 regex demo .

详情

' - 单引号(无需在双引号字符串文字中转义)
([^']*) - 一个 capturing group捕获除 + 以外的任何 0+(或 1+，如果您使用 ' 量词)字符([^...] 是一个否定字符类，匹配类中指定字符以外的任何字符)
' - 结束单引号。

请注意 re.findall 如果在模式中指定了捕获组，则只返回捕获的子字符串:

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

Python demo :

import re
text = "[(u'apple',), (u'banana',)]"
print(re.findall(r"'([^']*)'", text))
# => ['apple', 'banana']

转义引号支持

如果您需要支持转义引号(以便匹配 abc\'def 中的 'abc\'def'，您将需要像这样的正则表达式

re.findall(r"'([^'\\]*(?:\\.[^'\\]*)*)'", text, re.DOTALL) # in case the text contains only "valid" pairs of quotes
re.findall(r"(?<!\\)(?:\\\\)*'([^'\\]*(?:\\.[^'\\]*)*)'", text, re.DOTALL) # if your text is too messed up and there can be "wild" single quotes out there

参见 regex variation 1和 regex variation 2演示。

图案细节

(?<!\\) - 如果当前位置左侧有一个反斜杠，则匹配失败的负后视
(?:\\\\)* - 0 个或多个连续的双反斜杠(因为这些不会转义相邻字符)
' - 一个开放的'
([^'\\]*(?:\\.[^'\\]*)*) - 第 1 组(re.findall 将返回什么)匹配...
- [^'\\]* - 除 ' 以外的 0 个或更多字符和 \
- (?: - 开始non-capturing group那匹配
  - \\. - 任何转义字符(反斜杠和任何字符，包括由于 re.DOTALL 修饰符引起的换行符)
  - [^'\\]* - 除 ' 以外的 0 个或更多字符和 \
)* - ...零次或多次
' - 关闭 ' .

参见 another Python demo :

import re
text = r"[(u'apple',), (u'banana',)] [(u'apple',), (u'banana',), (u'abc\'def',)] \\'abc''def' \\\'abc   'abc\\\\\'def'"
print(re.findall(r"(?<!\\)(?:\\\\)*'([^'\\]*(?:\\.[^'\\]*)*)'", text))
# => apple, banana, apple, banana, abc\'def, abc, def, abc\\\\\'def

关于python - 提取单引号之间的子字符串，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/29152848/

python - 提取单引号之间的子字符串

上一篇：python - 为什么我的进程生成四次，而不是两次？

下一篇：Zope 中的 Python 脚本无法在外部方法中找到函数名称