python - 为什么 string.encode ('utf-8' ) != bytes(map(ord, string)) 是真的？

我们认为 bytes(map(ord, string)) 使用的是什么编码？为什么 string.encode('utf-8') != bytes(map(ord, string)) 有时是真的？

当客户端 javascript 与 Django 1.5 (Python 3) 应用程序交互时，我遇到了这个问题。

基本上，我使用 ajax 和 jDataView 将 mp3 文件作为字符串上传(我找不到直接上传文件的解决方案)。我使用 jDataView 将文件转换为字符串。在我的 Django 应用程序中，当我保存文件时它会改变大小。但是，如果我不使用 string.encode('utf-8') 而使用 bytes(map(ord, string)) 文件可以很好地保存。这是为什么？为什么是 string.encode('utf-8') != bytes(map(ord, string))？

我的客户端代码如下所示:

function send(file) {
    var reader = new FileReader();
    reader.onload = function(event) {
        var self = this;
        $.ajax({
            url: 'upload/',
            type: 'POST'
            data: {contents: (new jDataView(self.result)).getString()}
        });
    }
    reader.readAsArrayBuffer(file);
}

我的 View 接收到这样的数据:

def upload(request):
    contents = request.POST.get('contents')
    track = Track.objects.all[0] # For testing only
    contents = bytes(map(ord, contents))
    track.file.save('file.mp3', ContentFile(contents))

我检查了JS代码和Python代码中的contents是一回事。它们具有相同的字节长度，并且从适合我屏幕的第一个和最后几个字符判断，它们似乎具有相同的内容。

如果我将代码更改为

def upload(request):
    contents = request.POST.get('contents')
    track = Track.objects.all[0] # For testing only
    contents = contents.encoding('utf-8')
    track.file.save('file.mp3', ContentFile(contents))

文件大小发生变化，不再是有效的 mp3 文件。

最佳答案

UTF-8 不会将 Unicode 代码点直接映射到字节。这仅适用于 U+0000 到 U+007F 范围内的 ASCII 代码点。超出该范围，UTF-8 每个代码点使用 2 个或更多字节:

>>> '\u007f'.encode('utf8')
b'\x7f'
>>> '\u0080'.encode('utf8')
b'\xc2\x80'

您考虑的是 Latin-1 编码，代码点 U+0000 到 U+00FF 确实直接映射到字节:

>>> string = ''.join([chr(i) for i in range(0x100)])
>>> string.encode('latin-1') == bytes(map(ord, string))
True

您可以在存储之前对二进制数据进行 base64 编码，而不是编码为文本，或者您可以升级到 Django 1.6 或更高版本，以使用 binary field type .

关于python - 为什么 string.encode ('utf-8' ) != bytes(map(ord, string)) 是真的？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/26223981/

python - 为什么 string.encode ('utf-8' ) != bytes(map(ord, string)) 是真的？

上一篇：python - 做多个 or 的 pythonic 方法是什么？

下一篇：python - 如何在 Dict 中查找重复值并使用这些值打印键