python os.walk 和 unicode 错误

两个问题: 1.为什么

In [21]:                                                                                   
   ....:     for root, dir, file in os.walk(spath):
   ....:         print(root)

打印整棵树但是

In [6]: for dirs in os.walk(spath):                             
...:     print(dirs)

因这个 unicode 错误而感到窒息？

UnicodeEncodeError: 'charmap' codec can't encode character '\u2122' in position 1477: character maps to <undefined>

[注意:这是 TM 符号]

我查看了这些答案

Scraping works well until I get this error: 'ascii' codec can't encode character u'\u2122' in position

What's the deal with Python 3.4, Unicode, different languages and Windows?

python 3.2 UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position 9629: character maps to <undefined>

https://github.com/Drekin/win-unicode-console

https://docs.python.org/3/search.html?q=IncrementalDecoder&check_keywords=yes&area=default

并尝试了这些变化

----> 1 print(dirs, encoding='utf-8')                                                           
TypeError: 'encoding' is an invalid keyword argument for this function       
In [11]: >>> u'\u2122'.encode('ascii', 'ignore')                                                
Out[11]: b''                       

print(dirs).encode(‘utf=8’)

一切都没有效果。

这是在 Windows 10 上使用 python 3.4.3 和 Visual Studio Code 1.6.1 完成的。Visual Studio Code 中的默认设置包括:

// The default character set encoding to use when reading and writing files. "files.encoding": "utf8",

python 3.4.3 Visual Studio Code 1.6.1 ipython 3.0.0

更新编辑 我在 Sublime Text REPL 中再次尝试运行脚本。这是我得到的:

# -*- coding: utf-8 -*-
import os

spath = 'C:/Users/Semantic/Documents/Align' 

with open('os_walk4_align.txt', 'w') as f:
    for path, dirs, filenames in os.walk(spath):
        print(path, dirs, filenames, file=f)

Traceback (most recent call last):
File "listdir_test1.py", line 8, in <module>
print(path, dirs, filenames, file=f)
File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2605' in position 300: character maps to <undefined>

该代码只有 217 个字符长，那么“位置 300”从何而来？

最佳答案

这是一个测试用例:

C:\TEST
├───dir1
│       file1™
│
└───dir2
        file2

这是一个脚本(Python 3.x):

import os

spath = r'c:\test'

for root,dirs,files in os.walk(spath):
    print(root)

for dirs in os.walk(spath):                             
    print(dirs)

以下是支持 UTF-8 的 IDE(本例中为 PythonWin)上的输出:

c:\test
c:\test\dir1
c:\test\dir2
('c:\\test', ['dir1', 'dir2'], [])
('c:\\test\\dir1', [], ['file1™'])
('c:\\test\\dir2', [], ['file2'])

这是我的 Windows 控制台上的输出，默认为 cp437:

c:\test
c:\test\dir1
c:\test\dir2
('c:\\test', ['dir1', 'dir2'], [])
Traceback (most recent call last):
  File "C:\test.py", line 9, in <module>
    print(dirs)
  File "C:\Python33\lib\encodings\cp437.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2122' in position 47: character maps to <undefined>

对于问题 1，print(root) 起作用的原因是没有目录包含输出编码不支持的字符，但 print(dirs)现在正在打印一个包含 (root,dirs,files) 的元组，并且其中一个文件在 Windows 控制台中包含不受支持的字符。

对于问题 2，第一个示例将 utf-8 错误拼写为 utf=8，第二个示例未声明写入输出的文件的编码到，因此它使用了不支持该字符的默认值。

试试这个:

import os

spath = r'c:\test'

with open('os_walk4_align.txt', 'w', encoding='utf8') as f:
    for path, dirs, filenames in os.walk(spath):
        print(path, dirs, filenames, file=f)

os_walk4_align.txt 的内容，以 UTF-8 编码:

c:\test ['dir1', 'dir2'] []
c:\test\dir1 [] ['file1™']
c:\test\dir2 [] ['file2']

关于python os.walk 和 unicode 错误，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/40091750/

python os.walk 和 unicode 错误

上一篇：php - 在使用 gn build 构建 V8 后编译 v8js 扩展(而不是 gyp)

下一篇：numpy - Scipy.optimize - 使用固定参数进行曲线拟合