python - 来自 Python 正则表达式的 "Nothing to repeat"

这是一个正则表达式 - 由 egrep 尝试，然后由 Python 2.7 尝试:

$ echo '/some/path/to/file/abcde.csv' | egrep '*([a-zA-Z]+).csv'

/some/path/to/file/abcde.csv

但是，Python 中的相同正则表达式:

re.match(r'*([a-zA-Z]+)\.csv',f )

给予:

Traceback (most recent call last):
  File "/shared/OpenChai/bin/plothost.py", line 26, in <module>
    hosts = [re.match(r'*([a-zA-Z]+)\.csv',f ).group(1) for f in infiles]
  File "/usr/lib/python2.7/re.py", line 141, in match
    return _compile(pattern, flags).match(string)
  File "/usr/lib/python2.7/re.py", line 251, in _compile
    raise error, v # invalid expression
sre_constants.error: nothing to repeat

进行搜索后发现这里似乎存在一个 Python 错误:

regex error - nothing to repeat

It seems to be a python bug (that works perfectly in vim). The source of the problem is the (\s*...)+ bit.

但是，我不清楚:上面显示的我的正则表达式的解决方法是什么 - 让 python 开心？

谢谢。

最佳答案

您不需要 *在模式中，它会导致问题，因为您正试图量化模式的开头，但没有空字符串可量化。

同样的“ Nothing to repeat ”错误发生在你

将任何量词(+、?、*、{2}、{4,5}等)放在模式的开头(例如re.compile(r'?'))
在 ^ 之后添加任何量词/\A字符串 anchor 的开始(例如 re.compile(r'^*') )
在 $ 之后添加任何量词/\Z字符串 anchor 的结尾(例如 re.compile(r'$*') )
在单词边界后添加任何量词(例如 re.compile(r'\b*\d{5}') )

但是请注意，在 Python 中 re ，您可以量化任何环顾四周，例如(?<!\d)*abc和 (?<=\d)?abc将产生相同的匹配项，因为环视是可选的。

使用

([a-zA-Z]+)\.csv

或者匹配整个字符串:

.*([a-zA-Z]+)\.csv

参见 demo

原因是*未转义，因此被视为量词。它应用于正则表达式中的前面的子模式。在这里，它用在模式的开头，因此不能量化任何东西。因此，nothing to repeat 被抛出。

如果它在 VIM 中“有效”，那只是因为 VIM 正则表达式引擎忽略了这个子模式(与 Java 在像 [ 这样的字符类中处理未转义的 ] 和 [([)]] 相同)。

关于python - 来自 Python 正则表达式的 "Nothing to repeat"，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31386552/

python - 来自 Python 正则表达式的 "Nothing to repeat"

上一篇：python - python-C++扩展能否获取C++对象并调用其成员函数？

下一篇：python - 如何在 scipy.stats.gamma.fit 中获得拟合参数的误差估计？