python - 将 unicode 字符串转换为其原始格式

Possible Duplicate:
Converting a latin string to unicode in python

存储在文件中后我有一个具有以下格式的列表

list_example = [
         u"\u00cdndia, Tail\u00e2ndia &amp; Cingapura",
         u"Lines through the days 1 (Arabic) \u0633\u0637\u0648\u0631 \u0639\u0628\u0631 \u0627\u0644\u0623\u064a\u0627\u0645 1",
]

但是列表中字符串的实际格式是

actual_format = [
         "Índia, Tailândia & Cingapura ",
         "Lines through the days 1 (Arabic) سطور عبر الأيام 1 | شمس الدين خ "
]

如何转换 list_example 中的字符串到 actual_format 中存在的字符串列表？

最佳答案

你的问题对我来说有点不清楚。无论如何，以下指南应该可以帮助您解决问题。

如果您在 Python 源代码中定义这些字符串，那么您应该

了解您的编辑器以哪种字符编码保存源代码文件(例如 utf-8)
在源文件的第一行中声明该编码，例如# -*- coding: utf-8 -*-
将这些字符串定义为 unicode 对象:

strings = [u"Índia, Tailândia & Cingapura ", u"Lines through the days 1 (Arabic) سطور عبر الأيام 1 | شمس الدين خ "]

(注意:在 Python 3 中，文字字符串默认是 unicode 对象，即不需要 u 。在 Python 2 中，unicode 字符串的类型为 unicode ，在 Python 3 中，unicode 字符串的类型为输入 string 。)

当您想要将这些字符串保存到文件中时，您应该显式定义字符编码:

with open('filename', 'w') as f:
    s = '\n'.join(strings)
    f.write(s.encode('utf-8'))

当您想从该文件中再次读取这些字符串时，您必须再次显式定义字符编码才能正确解码文件内容:

with open('filename') as f:
    strings = [l.decode('utf-8') for line in f]

关于python - 将 unicode 字符串转换为其原始格式，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/10753297/

python - 将 unicode 字符串转换为其原始格式

上一篇：python - 如何在Python中分析列？

下一篇：python - Django GeoIP 错误；无效的路径类型