python - Unicode re.sub() 不适用于\g<0>(第 0 组)

为什么 \g<0> 没有使用 unicode 正则表达式？

当我尝试使用 \g<0> 时使用普通字符串正则表达式在组前后插入一个空格，它有效:

>>> punct = """,.:;!@#$%^&*(){}{}|\/?><"'"""
>>> rx = re.compile('[%s]' % re.escape(punct))
>>> text = '''"anständig"'''
>>> rx.sub(r" \g<0> ",text)
' " anst\xc3\xa4ndig " '
>>> print rx.sub(r" \g<0> ",text)
 " anständig "

但使用 unicode 正则表达式时，不会添加空格:

>>> punct = u""",–−—’‘‚”“‟„!£"%$'&)(+*-€/.±°´·¸;:=<?>@§#¡•[˚]»_^`≤…\«¿¨{}|"""
>>> rx = re.compile("["+"".join(punct)+"]", re.UNICODE)
>>> text = """„anständig“"""
>>> rx.sub(ur" \g<0> ", text)
'\xe2\x80\x9eanst\xc3\xa4ndig\xe2\x80\x9c'
>>> print rx.sub(ur" \g<0> ", text)
„anständig“

如何获得 \g在 unicode 正则表达式中工作？
如果 (1) 不可能，我如何让 unicode 正则表达式输入 punct 中字符前后的空格？？

最佳答案

我认为你有两个错误。首先，您没有像第一个示例中那样使用 re.escape 转义 punct 并且您有像 [] 这样的字符需要转义。其次，text 变量不是 unicode。有效的例子:

>>> punct = re.escape(u""",–−—’‘‚”“‟„!£"%$'&)(+*-€/.±°´·¸;:=<?>@§#¡•[˚]»_^`≤…\«¿¨{}|""")
>>> rx = re.compile("["+"".join(punct)+"]", re.UNICODE)
>>> text = u"""„anständig“"""
>>> print rx.sub(ur" \g<0> ", text)
 „ anständig “

关于python - Unicode re.sub() 不适用于\g<0>(第 0 组)，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/19427548/

python - Unicode re.sub() 不适用于\g<0>(第 0 组)

上一篇：python - PyMC 中的在线贝叶斯学习(反复更新后验信念)

下一篇：python pandas 散点图错误 : is this a bug with pandas?