Python 如何将 8 位 ASCII 字符串转换为 16 位 Unicode

虽然Python 3.x解决了某些语言环境的大小写问题(例如tr_TR.utf8)，但Python 2.x分支缺少这个问题。此问题的几种解决方法，如 https://github.com/emre/unicode_tr/但不喜欢这种解决方案。

所以我正在实现一个新的 upper/lower/capitalize/title 方法，用于猴子修补 unicode 类 string.maketrans方法。

maketrans 的问题是两个字符串的长度必须相同。我想到的最接近的解决方案是“如何将 1 字节字符转换为 2 字节？”

<小时/>

注意: 当我传递 u'і' (1 字节长度\u0130)作为参数时，translate 方法仅适用于 ascii 编码翻译给出ascii编码错误。

from string import maketrans

import unicodedata
c1 = unicodedata.normalize('NFKD',u'i').encode('utf-8')
c2 = unicodedata.normalize('NFKD',u'İ').encode('utf-8')
c1,len(c1)
('\xc4\xb1', 2)

# c2,len(c2)
# ('I', 1)
'istanbul'.translate( maketrans(c1,c2))
ValueError: maketrans arguments must have same length

最佳答案

Unicode 对象允许通过字典进行多字符翻译，而不是通过 maketrans 映射的两个字节字符串。

#!python2
#coding:utf8
D = {ord(u'i'):u'İ'}
print u'istanbul'.translate(D)

输出:

İstanbul

如果您从 ASCII 字节字符串开始并希望结果为 UTF-8，只需围绕翻译进行解码/编码即可:

#!python2
#coding:utf8
D = {ord(u'i'):u'İ'}
s = 'istanbul'.decode('ascii')
t = s.translate(D)
s = t.encode('utf8')
print repr(s)

输出:

'\xc4\xb0stanbul'

下面的技术可以完成maketrans的工作。请注意，字典键必须是 Unicode 序数，但值可以是 Unicode 序数、Unicode 字符串或 None。如果无，则翻译时将删除该字符。

#!python2
#coding:utf8
def maketrans(a,b):
    return dict(zip(map(ord,a),b))
D = maketrans(u'àáâãäå',u'ÀÁÂÃÄÅ')
print u'àbácâdãeäfåg'.translate(D)

输出:

ÀbÁcÂdÃeÄfÅg

引用:str.translate

关于Python 如何将 8 位 ASCII 字符串转换为 16 位 Unicode，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/27448120/

Python 如何将 8 位 ASCII 字符串转换为 16 位 Unicode

上一篇：python - 如何访问模块对象上的 dir() Python 属性？

下一篇：python - 我如何让 statsmodels.api 只绘制一次时间序列图？