python-3.x - Python3 表情符号字符作为 unicode

我在 python3 中有一个字符串，其中包含表情符号，我想将表情符号视为它们的 unicode 表示形式。我需要对这种格式的表情符号进行一些操作。

s = '😬 😎 hello'

这将每个表情符号视为自己的字符，例如 len(s) == 9 && s[0] == 😬

我想更改字符串的格式，使其采用 unicode 点，这样

s = '😬 😎 hello'
u = to_unicode(s)   # Some function to change the format.
print(u) # '\ud83d\ude2c \ud83d\ude0e hello'
u[0] == '\ud83d' and u[1] == '\ude2c'
len(u) == 11

对于创建一个将 s 更改为 u 的函数 to_unicode 有什么想法吗？我可能正在思考字符串/unicode 在 python3 中的工作方式是错误的，因此任何帮助/更正将不胜感激。

最佳答案

以下代码将获取映射为两个 UTF-16 单词的任何字符并将其转换为十六进制序列。

s = '\U0001f62c \U0001f60e hello'

def pairup(b):
    return [(b[i] << 8 | b[i+1]) for i in range(0, len(b), 2)]

def utf16(c):
    e = c.encode('utf_16_be')
    return ''.join(chr(x) for x in pairup(e))

u = ''.join(utf16(c) for c in s)
print(repr(u))
print(u[0] == '\ud83d' and u[1] == '\ude2c')
print(len(u))

'\ud83d\ude2c \ud83d\ude0e hello'
True
11

我以为这会是一件理所当然的事，但事实证明比我想象的要棘手。特别是因为我第一次没有正确理解这个问题。

关于python-3.x - Python3 表情符号字符作为 unicode，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/32826565/

上一篇：django - @vary_on_cookie 由于非 Django cookie 而失败

下一篇：java - 使用 OTG USB 从 PC 到 Android USB 流数据 - 如何？

相关文章：

python - Hook 生成器函数的最佳方法(如果有)

python - 安装 python 模块后出现 Linux ModuleNotFoundError

c - 格式为\Unnnnnnnn 的 Unicode 代码点

html - Unicode 表情符号的颜色

css - Unicode 表情符号字符根据浏览器使用不同的图像/字体

Python列表——去重和添加子列表元素

Python 2 和 3 兼容的字符串文件编写器

android - 选择在 android 中查看不同语言的文本？

python - Python 的 urllib 中网页的 Unicode 问题

android - iOS和Android之间的表情符号