python - 如何检测字符串字节编码？

os.listdir() 读取了大约 1000 个文件名，其中一些以 UTF8 编码，一些是 CP1252。

我想将它们全部解码为 Unicode，以便在我的脚本中进行进一步处理。有没有办法让源编码正确解码为 Unicode？

例子:

for item in os.listdir(rootPath):

    #Convert to Unicode
    if isinstance(item, str):
        item = item.decode('cp1252')  # or item = item.decode('utf-8')
    print item

最佳答案

使用 chardet 库。 super 简单

import chardet

the_encoding = chardet.detect('your string')['encoding']

就是这样!

在 python3 中你需要提供类型 bytes 或 bytearray 所以:

import chardet
the_encoding = chardet.detect(b'your string')['encoding']

关于python - 如何检测字符串字节编码？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/15918314/

上一篇：python - 如何从同一模块中的类名字符串中获取类对象？

下一篇：python - 在 Ubuntu 上安装 SciPy/Python

相关文章：

python - Pygame 音频断断续续和滞后

python - 尝试抓取 HTML 表并转换为 Python 中的数据框。代码无法正常工作

c++ - 读取包括空字节在内的所有字符

c# - 在 C# 中使用通配符获取动态生成文件的字符串？

python - 将 datetime.ctime() 值转换为 Unicode

python - 如何在 PyQt4 中保持按钮相对于标签大小的变化不变

python - 在 python 中使用 str.format() 女巫类

java - 在 Java 8 中如何连接不带分隔符的 String[]？

unicode - 是否有 Unicode 字形符号来表示 "Search"

c# - 使用正则表达式 C# 替换 Unicode(泰米尔语)字符串