python - 当正文中有 unicode 字符时,使用 python 解析 Gmail 电子邮件

标签 python email unicode

我编写了一个脚本来解析电子邮件。从 Mac OS X 邮件客户端接收信件时它工作正常(到目前为止只测试过这个),但是当信件的正文部分包含 unicode 字母时我的解析器失败。

例如,我发送了一条内容为 ąčę 的消息。

这是我同时解析正文和附件的脚本部分:

p = FeedParser()
p.feed(msg)
msg = p.close()
attachments = []
body = None
for part in msg.walk():
  if part.get_content_type().startswith('multipart/'):
    continue
  try:
    filename = part.get_filename()
  except:
    # unicode letters in filename, set default name then
    filename = 'Mail attachment'

  if part.get_content_type() == "text/plain" and not body:
    body = part.get_payload(decode=True)
  elif filename is not None:
    content_type = part.get_content_type()
    attachments.append(ContentFile(part.get_payload(decode=True), filename))

if body is None:
    body = ''

好吧,我提到它适用于来自 OS X Mail 的信件,但不适用于 Gmail 信件。

回溯:

Traceback (most recent call last): File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/core/handlers/base.py", line 116, in get_response response = callback(request, *callback_args, **callback_kwargs) File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/views/decorators/csrf.py", line 77, in wrapped_view return view_func(*args, **kwargs) File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/views/decorators/http.py", line 41, in inner return func(request, *args, **kwargs) File "/Users/aemdy/PycharmProjects/rezervavau/bms/messages/views.py", line 66, in accept Message.accept(request.POST.get('msg')) File "/Users/aemdy/PycharmProjects/rezervavau/bms/messages/models.py", line 261, in accept thread=thread File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/models/manager.py", line 149, in create return self.get_query_set().create(**kwargs) File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/models/query.py", line 391, in create obj.save(force_insert=True, using=self.db) File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/models/base.py", line 532, in save force_update=force_update, update_fields=update_fields) File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/models/base.py", line 627, in save_base result = manager._insert([self], fields=fields, return_id=update_pk, using=using, raw=raw) File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/models/manager.py", line 215, in _insert return insert_query(self.model, objs, fields, **kwargs) File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/models/query.py", line 1633, in insert_query return query.get_compiler(using=using).execute_sql(return_id) File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 920, in execute_sql cursor.execute(sql, params) File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/backends/util.py", line 47, in execute sql = self.db.ops.last_executed_query(self.cursor, sql, params) File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg2/operations.py", line 201, in last_executed_query return cursor.query.decode('utf-8') File "/Users/aemdy/virtualenvs/django1.5/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0xe0 in position 115: invalid continuation byte

我的脚本为我提供了以下正文 ����。我如何对其进行解码以返回 ąčę

最佳答案

好吧,我自己找到了解决方案。我现在会做一些测试,如果有任何失败,我现在会告诉你们。

我需要再次解码正文:

body = part.get_payload(decode=True).decode(part.get_content_charset())

关于python - 当正文中有 unicode 字符时,使用 python 解析 Gmail 电子邮件,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/13445498/

相关文章:

php - 如何从 Zend_Mail_Message 获取发件人的电子邮件地址?

email - Flutter - 如何实现一个电子邮件 TextFormField 小部件在多个屏幕中使用?

java - 检测电子邮件正文的语言

c - Windows 应用程序 ANSI 到 Unicode

python - 理解 decode() 和 encode() unicode

ruby-on-rails - activerecord-sqlserver-adapter unicode 性能缓慢

python - Tornado "error: [Errno 24] Too many open files"错误

python - 从列表/数组字典中迭代实例化类的最佳方法是什么?

python - 基于 Django 类的通用 View 重定向

python - 多线程处理多个传入请求