python - python 中的正则表达式，这可以改进吗？

我有这段代码可以找到以@或#开头的单词，

p = re.findall(r'@\w+|#\w+', str)

现在让我厌烦的是重复\w+。我确信有一种方法可以做类似的事情

p = re.findall(r'(@|#)\w+', str)

这将产生相同的结果，但实际上并没有，它只返回 # 和 @。如何更改该正则表达式，以便我不重复 \w+？这段代码很接近，

p = re.findall(r'((@|#)\w+)', str)

但它返回 [('@many', '@'), ('@this', '@'), ('#tweet', '#')](注意额外的“@”、“@”和“#”。

此外，如果我将此 re.findall 代码重复 500,000 次，是否可以将其编译为一种模式然后更快？

最佳答案

解决方案

你有两个选择:

使用非捕获组:(?:@|#)\w+
或者更好，一个字符类:[@#]\w+

引用资料

regular-expressions.info/Character Class和 Groups

了解`findall`

您遇到的问题是由于 findall 如何根据存在的捕获组数量返回匹配。

让我们仔细看看这个模式(注释以显示组):

((@|#)\w+)
|\___/   |
|group 2 |     # Read about groups to understand
\________/     # how they're defined and numbered/named
 group 1

捕获组允许我们将匹配项保存在整体模式中的子模式中。

p = re.compile(r'((@|#)\w+)')
m = p.match('@tweet')
print m.group(1)
# @tweet
print m.group(2)
# @

现在让我们看一下 re 模块的 Python 文档:

findall: Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

这解释了为什么您会收到以下内容:

str = 'lala @tweet boo #this &that @foo#bar'

print(re.findall(r'((@|#)\w+)', str))
# [('@tweet', '@'), ('#this', '#'), ('@foo', '@'), ('#bar', '#')]

正如指定的那样，由于模式有多个组，findall 返回一个元组列表，每个匹配一个。每个元组都会为您提供给定匹配的组所捕获的内容。

文档还解释了为什么您会收到以下信息:

print(re.findall(r'(@|#)\w+', str))
# ['@', '#', '@', '#']

现在模式只有一个组，findall 返回该组的匹配列表。

相比之下，上面作为解决方案给出的模式没有任何捕获组，这就是为什么它们按照您的期望工作:

print(re.findall(r'(?:@|#)\w+', str))
# ['@tweet', '#this', '@foo', '#bar']

print(re.findall(r'[@#]\w+', str))
# ['@tweet', '#this', '@foo', '#bar']

引用资料

附件

Snippet with output on ideone.com

关于python - python 中的正则表达式，这可以改进吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/2960969/

python - python 中的正则表达式，这可以改进吗？

解决方案

引用资料

了解`findall`

引用资料

附件

上一篇：python - 在python中排序

下一篇：python - django 同步数据库问题

python - python 中的正则表达式，这可以改进吗？

解决方案

引用资料

了解findall

引用资料

附件

上一篇：python - 在python中排序

下一篇：python - django 同步数据库问题

了解`findall`