python-3.x - 如何使用 Python 3 处理 utf-8 文本？

标签 python-3.x utf-8 character-encoding

我需要解析各种文本源，然后将其打印/存储在某处。

每次遇到非 ASCII 字符时，我都无法正确打印它，因为它被转换为字节，而且我不知道如何查看正确的字符。

(我对 Python 很陌生，我来自 PHP，我从未遇到过任何 utf-8 问题)

下面是一个代码示例:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import codecs
import feedparser

url = "http://feeds.bbci.co.uk/japanese/rss.xml"
feeds = feedparser.parse(url)
title = feeds['feed'].get('title').encode('utf-8')

print(title)

file = codecs.open("test.txt", "w", "utf-8")
file.write(str(title))
file.close()

我想打印并在文件中写入 RSS 标题(BBC 日语 - ホーム)，但结果是这样的:

b'BBC Japanese - \xe3\x83\x9b\xe3\x83\xbc\xe3\x83\xa0'

在屏幕和文件上。有没有合适的方法来做到这一点？

最佳答案

在 python3 中，bytes 和 str 是两种不同的类型——str 用于表示任何类型的字符串(也是 unicode)，当你 encode() 某些东西时，你将它从它的 str 表示转换为它的 bytes 表示，用于特定的编码。

在您的情况下，为了解码字符串，您只需要删除 encode('utf-8') 部分:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import codecs
import feedparser

url = "http://feeds.bbci.co.uk/japanese/rss.xml"
feeds = feedparser.parse(url)
title = feeds['feed'].get('title')

print(title)

file = codecs.open("test.txt", "w", encoding="utf-8")
file.write(title)
file.close()

关于python-3.x - 如何使用 Python 3 处理 utf-8 文本？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/38346619/

上一篇：asp.net-mvc - 从 MVC Razor 模型调用 Bootstrap 3 模态？

下一篇：types - Julia 中的值类型实例

相关文章：

php - 从 PHP 创建的 JSON 提供了错误的数据？

php - iconv() : Wrong charset, 不允许从 `auto' 转换为 `utf-8//IGNORE'

java - JSP无法显示俄语字符

MYSQL 5.1.61 在 utf8 中对中欧语言进行排序

python - 使用获取相同值的只读列覆盖只读属性

python - Python 中可选静态类型的现状？

python - 如何避免此编码挑战中的运行时错误？

python - 如何在 Python 中正确比较来自 psycopg2 的 unicode 字符串？

eclipse - Nutch 无法获取 UTF-8 字符

python - 迭代 S3 对象，而不仅仅是对象中的所有键/存储桶