python - 将数字的 unicode 表示形式转换为 ascii 字符串

我一直在寻找一种简单的方法来将数字从 unicode 字符串转换为 python 中的 ascii 字符串。例如，输入:

input = u'\u0663\u0669\u0668\u066b\u0664\u0667'

应该产生 '398.47'。

我开始于:

NUMERALS_TRANSLATION_TABLE = {0x660:ord("0"), 0x661:ord("1"), 0x662:ord("2"), 0x663:ord("3"), 0x664:ord("4"), 0x665:ord("5"), 0x666:ord("6"), 0x667:ord("7"), 0x668:ord("8"), 0x669:ord("9"), 0x66b:ord(".")}
input.translate(NUMERALS_TRANSLATION_TABLE)

此解决方案有效，但我希望能够支持 unicode 中所有与数字相关的字符，而不仅仅是阿拉伯语。我可以通过遍历 unicode 字符串并在每个字符上运行 unicodedata.digit(input[i]) 来翻译数字。我不喜欢这个解决方案，因为它不能解决 '\u066b' 或 '\u2013'。我可以通过使用 translate 作为后备来解决这些问题，但我不确定是否还有其他我目前不知道的此类字符，所以我正在寻找更好的，更优雅的解决方案。

如有任何建议，我们将不胜感激。

最佳答案

使用 unicodedata.digit()查找“数字”代码点的数字值是正确的方法:

>>> import unicodedata
>>> unicodedata.digit(u'\u0663')
3

这使用 Unicode 标准信息来查找给定代码点的数值。

您可以使用 str.isdigit() 构建翻译表测试数字；对于标准定义数值的所有代码点都是如此。对于小数点，您可以在名称中查找 DECIMAL SEPARATOR；该标准不会通过任何其他指标单独跟踪这些:

NUMERALS_TRANSLATION_TABLE = {
    i: unicode(unicodedata.digit(unichr(i)))
    for i in range(2 ** 16) if unichr(i).isdigit()}
NUMERALS_TRANSLATION_TABLE.update(
    (i, u'.') for i in range(2 ** 16)
    if 'DECIMAL SEPARATOR' in unicodedata.name(unichr(i), ''))

这会生成一个包含 447 个条目的表格，包括 U+066b ARABIC DECIMAL SEPARATOR 处的 2 个小数点和 U+2396 DECIMAL SEPARATOR KEY SYMBOL ;后者实际上只是一个虚构的符号，用于放置在数字键盘上的小数点分隔符键上，制造商不想 promise 打印 、 或 。 该键上的小数点分隔符。

演示:

>>> import unicodedata
>>> NUMERALS_TRANSLATION_TABLE = {
...     i: unicode(unicodedata.digit(unichr(i)))
...     for i in range(2 ** 16) if unichr(i).isdigit()}
>>> NUMERALS_TRANSLATION_TABLE.update(
...     (i, u'.') for i in range(2 ** 16)
...     if 'DECIMAL SEPARATOR' in unicodedata.name(unichr(i), ''))
>>> input = u'\u0663\u0669\u0668\u066b\u0664\u0667'
>>> input.translate(NUMERALS_TRANSLATION_TABLE)
'398.47'

关于python - 将数字的 unicode 表示形式转换为 ascii 字符串，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25313773/

python - 将数字的 unicode 表示形式转换为 ascii 字符串

上一篇：python - 为什么代码在收到响应后不执行

下一篇：python - 如何以编程方式在 wxPython 中生成事件