Python 正则表达式 £ 到 char

我正在编写一个程序来搜索文件，寻找 £ 符号:

 r = re.compile(r"£\S*£")
 def parseData(self):
    f = open(file, 'r')
    fs = f.read()
    res = r.findall(fs)
    return res

出于某种原因，我的输出包含符号，例如 £foo £，其中文件为 £foo£。

我正在使用 python 3.4.3，如果有帮助的话。

完整文件读取 http://pastebin.com/L7hjeg6A

最佳答案

问题是该文件以一种格式编码，但您正在以不同的格式打开该文件。最有可能的是，该文件是 utf-8 ，但您正在以某种 ANSI 格式打开(当我将编码从 UTF-8 更改为 ANSI 时，我在 notepad++ 中看到了类似的问题，对于 £纬度£)。显示相同行为的示例 -

我的a.txt -

£Latitude£

代码-

>>> f = open('a.txt','r')
>>> s = f.read()
>>> s
'\xc2£Latitude\xc2£'

>>> f = open('a.txt','r',encoding='utf-8')
>>> s = f.read()
>>> s
'£Latitude£'

您需要以正确的编码打开文件，方法是将编码作为参数传递给 open() ，就像上面所做的那样。

来自documentation of open() -

encoding is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever locale.getpreferredencoding() returns), but any text encoding supported by Python can be used. See the codecs module for the list of supported encodings.

关于Python 正则表达式 £ 到 char，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/31937272/

Python 正则表达式 £ 到 char

上一篇：python - Flask 中的全局变量不一致

下一篇：python - alsamixer amixer 转 python 字典格式