python - 有没有办法在 Python 2.7 中找到字符的 Unicode 代码点？

我在我的 Python 程序中使用国际音标 (IPA) 符号，这是一组相当奇怪的字符，其 UTF-8 代码的长度范围从 1 到 3 个字节不等。 This thread几年前基本上问了相反的问题，似乎 ord(character) 可以检索一个十进制数，我可以将其转换为十六进制，然后再转换为代码点，但是 ord 的输入() 好像只限一个字节。如果我在任何非 ASCII 字符上尝试 ord()，例如 ɨ，它会输出:

TypeError: ord() expected a character, but a string of length 2 found

既然这不再是一个选项，Python 2.7 中是否有任何方法可以找到给定字符的 Unicode 代码点？ (然后该字符是否必须是 unicode 类型？)我的意思也不是仅在 Unicode 表中手动查找它。

最佳答案

With that no longer an option, is there any way in Python 2.7 to find the Unicode code point of a given character? (And does that character then have to be a unicode type?) I don't mean by just manually looking it up on a Unicode table, either.

只能找到unicode对象的unicode代码点。要将字节字符串转换为 unicode 对象，请使用 mystr.decode(encoding) 对其进行解码，其中 encoding 是字符串的编码。 (你知道你的字符串的编码，对吧？它可能是 UTF-8。:-) 然后你可以根据你已经找到的说明使用 ord。

>>> ord(b"ɨ".decode('utf-8'))
616

顺便说一句，从您的问题来看，您似乎正在使用 UTF-8 编码字节形式的字符串。那可能会很痛苦。您应该在获得字符串后立即将它们解码为 unicode 对象，并且仅在需要将它们输出到某个地方时才对它们进行编码。

关于python - 有没有办法在 Python 2.7 中找到字符的 Unicode 代码点？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38909362/

python - 有没有办法在 Python 2.7 中找到字符的 Unicode 代码点？

上一篇：python - 将 Pandas 切割操作转换为常规字符串

下一篇：python - 合并 2 个列表时 pd.DataFrame() 失败