c# - 我可以在 Python3 中使用不同的代码点吗？

我遇到了很多从 C# 到 python 的字符串索引问题。基本上，现有的数据管道(C# 中)会生成一些字符串索引供 python 模型使用。发生的情况是，这两种语言在各自的 unicode 系统中使用不同的代码点，如下所示:http://illegalargumentexception.blogspot.com/2010/04/i18n-comparing-character-encoding-in-c.html

因此，C# 中的字符串长度和索引(16 位、隐式 utf-16)在 Python(16 或 32)中并非 100% 相关。有时，如果字符大于 0xFFFF(大于 16 位)，Python 会生成比 C# 更小的字符串长度。

问题是:有什么方法可以确保字符串索引和长度相同？是否可以像 C# 一样强制 Python 使用隐式 16 位？

具体的例子是这样的:

𐤑𐤅𐤓, Ṣur

及其 utf-8 字节:

b'\xf0\x90\xa4\x91\xf0\x90\xa4\x85\xf0\x90\xa4\x93, \xe1\xb9\xa2ur'

在 Python 中，该字符串的长度为 12，而 C# 报告为 15。索引也会从一种语言变为另一种语言。

最佳答案

您可能想使用 StringInfo根据这个答案的类:Why is the length of this string longer than the number of characters in it?

using System;
using System.Text;
using System.Globalization;

namespace StackOverflow {
    class Program {
        public static void Main(string[] args) {
            var s = "𐤑𐤅𐤓, Ṣur";
            // Len == 11
            Console.WriteLine("{0}: {1}", s, s.Length);

            // len == 8
            var si = new StringInfo(s);
            Console.WriteLine("{0}: {1}", s, si.LengthInTextElements);
        }
    }
}

或者，在 Python 方面，您可以尝试此操作，但它与 C# 的长度不太相同，因为它假定 2 字节，因此它仅覆盖前 65,536 个 UTF-16 字符:

#!/usr/bin/env python3

s = "𐤑𐤅𐤓, Ṣur"
# len == 8 (displayable len)
print("{}: {}".format(s, len(s)))

# len == 11 (C# wackiness)
print(int(len(s.encode("utf-16")) / 2) - 1)

关于c# - 我可以在 Python3 中使用不同的代码点吗？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47879399/

c# - 我可以在 Python3 中使用不同的代码点吗？

上一篇：python - iMac上的Shell脚本不再与High Sierra一起使用

下一篇：python - 批量创建小部件