python - Windows 上使用 Python 和 subprocess.Popen() 的 Unicode 文件名

为什么会出现以下情况:

>>> u'\u0308'.encode('mbcs')   #UMLAUT
'\xa8'
>>> u'\u041A'.encode('mbcs')   #CYRILLIC CAPITAL LETTER KA
'?'
>>>

我有一个 Python 应用程序接受来自操作系统的文件名。它适用于某些国际用户，但不适用于其他用户。

例如，这个 unicode 文件名: 你'\u041a\u0433\u044b\u044b\u0448\u0444\u0442'

不会使用 Windows 的“mbcs”编码(文件系统使用的编码，由 sys.getfilesystemencoding() 返回)进行编码。我得到“????????”，表示编码器在这些字符上失败。但这没有意义，因为文件名一开始就来自用户。

更新:这是我背后原因的背景...... 我的系统上有一个西里尔文名称的文件。我想以该文件作为参数调用 subprocess.Popen() 。 Popen 不会处理 unicode。通常我可以使用 sys.getfilesystemencoding() 给出的编解码器对参数进行编码。在这种情况下它不会工作

最佳答案

在 Py3K 中——至少从 Python 3.2 开始——subprocess.Popen 和 sys.argv 在 Windows 上与(默认 unicode)字符串一致工作。 CreateProcessW 和GetCommandLineW 明显用到了。

在 Python 中 - 至少达到 v2.7.2 - subprocess.Popen 存在 Unicode 参数问题。它遵循 CreateProcessA(而 os.* 与 Unicode 一致)。并且 shlex.split 制造了额外的废话。

Pywin32 的 win32process.CreateProcess 也不会自动切换到 W 版本，也没有 win32process.CreateProcessW。与 GetCommandLine 相同。因此需要使用 ctypes.windll.kernel32.CreateProcessW...。 subprocess 模块可能应该针对此问题进行修复。

argv[1:] 上带有私有(private)应用程序的 UTF8 在 Unicode 操作系统上仍然很笨拙。这些技巧对于像 Linux 这样的 8 位“Latin1”字符串操作系统可能是合法的。

更新 vaab 为 Python 2.7 创建了 Popen 的补丁版本，修复了这个问题。
参见 https://gist.github.com/vaab/2ad7051fc193167f15f85ef573e54eb9
带有解释的博文:http://vaab.blog.kal.fr/2017/03/16/fixing-windows-python-2-7-unicode-issue-with-subprocesss-popen/

关于python - Windows 上使用 Python 和 subprocess.Popen() 的 Unicode 文件名，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/1910275/

python - Windows 上使用 Python 和 subprocess.Popen() 的 Unicode 文件名

上一篇：python - GAE 和 Django : What are the benefits?

下一篇：python - Python 中的优化点积