python - 正则表达式 - python 2.6 和 3.3 中的不同输出

当我对正则表达式执行相同的代码时，我在 python 2 和 3 中得到不同的输出。

假设这是我想要的数据，它位于网页的某个位置。

source = ['\x1e\x1e5.5.30-log\x1epcofiowa@localhost\x1epcofiowa_pci\x1e',
          '\x1e\x1e5.5.30-log\x1epcofiowa@localhost\x1epcofiowa_pci\x1e', 
          '\x1e\x1e5.5.30-log\x1epcofiowa@localhost\x1epcofiowa_pci\x1e', 
          '\x1e\x1e5.5.30-log\x1epcofiowa@localhost\x1epcofiowa_pci\x1e']

所以，当我在 python 2.6 中运行以下代码时，它运行得很好。我得到了如上所示的精确输出。

match = re.findall("\x1e\x1e\S+",source)

但是当我在 python 3.3 中执行它时，例如:

match = re.findall("\x1e\x1e\S+", str(source))

我得到了匹配变量的输出，例如:

['\x1e\x1e5.5.30-log', '\x1e\x1e5.5.30-log', '\x1e\x1e5.5.30-log','\x1e\x1e5.5.30-log']

那么，你能告诉我为什么在 python 3 中不采用整个字符串吗？为什么每次都会跳过 \x1epcofiowa@localhost\x1epcofiowa_pci\x1e ？我想要像 python 2.6 那样的输出。

所以，我现在一无所知。我在等待你的答复。谢谢。

最佳答案

似乎 \S 在 Python 2 和 Python 3 中的行为有所不同。

根据Python 3 re module docs :-

\S - Matches any character which is not a Unicode whitespace character. This is the opposite of \s. If the ASCII flag is used this becomes the equivalent of [^ \t\n\r\f\v] (but the flag affects the entire regular expression, so in such cases using an explicit [^ \t\n\r\f\v] may be a better choice).

现在，由于 \x1e (相当于 U+001E)，您的 \x1e\x1e5.5.30-log 之后的内容是unicode 空白 字符 - reference to activestate ，因此在 Python 3 中与 \S 不匹配。

<小时/>

而在 Python 2 中:-

\S - Matches any non-whitespace character; this is equivalent to the class [^ \t\n\r\f\v].

因此，它只考虑 ASCII 字符集来匹配非空白，因此它匹配 \x1e。

关于python - 正则表达式 - python 2.6 和 3.3 中的不同输出，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/14768922/

python - 正则表达式 - python 2.6 和 3.3 中的不同输出

上一篇：python - 我会赋值还是直接在其他变量中使用它们？

下一篇：Python - 访问顶级变量的更快方法