encoding - Python 3 中的流/字符串/字节数组转换

Python 3 清理了 Python 对 Unicode 字符串的处理。我认为，作为这项工作的一部分，Python 3 中的编解码器变得更加严格，根据 Python 3 documentation与Python 2 documentation相比.

例如，从概念上将字节流转换为不同形式的字节流的编解码器已被删除:

base64_编解码器
bz2_编解码器
十六进制编解码器

概念上将 Unicode 转换为不同形式 Unicode 的编解码器也已被删除(在 Python 2 中，它实际上介于 Unicode 和字节流之间，但我认为从概念上讲，它实际上是 Unicode 到 Unicode):

rot_13

我的主要问题是，Python 3 中执行这些已删除的编解码器过去所做的事情的“正确方法”是什么？它们不是严格意义上的编解码器，而是“转换”。但接口(interface)和实现与编解码器非常相似。

我不关心 rot_13，但我有兴趣知道实现行结束样式转换(Unix 行结束与 Windows 行结束)的“最佳方法”是什么，这实际上应该是 Unicode-在编码为字节流之前完成到 Unicode 转换，尤其是在使用 UTF-16 时，如所述 this other SO question .

最佳答案

看起来所有这些非编解码器模块都是根据具体情况进行处理的。这是我迄今为止发现的内容:

base64 现已通过 base64 提供模块
bz2 现在可以使用 bz2 完成模块
十六进制字符串编码/解码可以使用hexlify和unhexlify来完成> binascii 的功能模块(有点隐藏功能)

我猜这意味着没有标准框架来创建此类字符串/字节数组转换模块，但它们是在 Python 3 中根据具体情况完成的。

Python 3.2 更新

一个comment on a blog post "Compressing text using Python’s unicode support"提醒我这些编解码器已回归 Python 3.2。

引用评论:

Since these are “text-to-text” or “binary-to-binary” transforms, though, the encode()/decode() methods in Python 3.x don’t support this style of usage – it’s a Python 2.x only feature).

The codecs themselves are back in 3.2, but you need to go through the codecs module API in order to use them – they aren’t available via the object method shorthand.

查看Python 3 docs for codecs — Binary Transforms .

来自a blog post by Barry Warsaw :

Did you know that Python 2 provides some codecs for doing interesting conversions such as Caeser rotation (i.e. rot13)? Thus, you can do things like:
>>> 'foo'.encode('rot-13')
'sbb'
This doesn't work in Python 3 though, because even though certain str-to-str codecs like rot-13 still exist, the str.encode() interface requires that the codec return a bytes object. In order to use str-to-str codecs in both Python 2 and Python 3, you'll have to pop the hood and use a lower-level API, getting and calling the codec directly:
>>> from codecs import getencoder
>>> encoder = getencoder('rot-13')
>>> rot13string = encoder(mystring)[0]
You have to get the zeroth-element from the return value of the encoder because of the codecs API. A bit ugly, but it works in both versions of Python.

关于encoding - Python 3 中的流/字符串/字节数组转换，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/1197589/

encoding - Python 3 中的流/字符串/字节数组转换

Python 3.2 更新

上一篇：maven-2 - 使用 Maven 从依赖项 jar 中删除文件

下一篇：arrays - 如何在AS3中制作2D矢量