ruby - 通过 Ruby Net::IMAP 收集邮件时如何最好地编码或清理电子邮件正文

标签 ruby email encoding character-encoding imap

我正在使用下面的代码从 IMAP 服务器收集电子邮件,但电子邮件正文的内容通常非常难看,有时甚至无法理解。许多电子邮件包含丹麦语和瑞典语特殊字符,例如æ、ä、ö、ø 和 å,但我认为这不是问题所在。如何最好地编码和清理?

imap = Net::IMAP.new(address, port, enable_ssl?)
imap.login(user_name, password)
imap.examine(flag)

search_query = "#{last_uid}:*"

imap.uid_search(search_query).each do |uid|
  if uid.to_i > last_uid.to_i

    header = imap.uid_fetch(uid, "BODY[HEADER.FIELDS (FROM TO DATE SUBJECT)]")[0].attr["BODY[HEADER.FIELDS (FROM TO DATE SUBJECT)]"]
    from = Mail.read_from_string(header).from.first
    to = Mail.read_from_string(header).to.first rescue nil
    subject = Mail.read_from_string(header).subject
    date = Mail.read_from_string(header).date

    body = imap.uid_fetch(uid, "BODY[TEXT]")[0].attr["BODY[TEXT]"].gsub(/\r\n?/, "\n").force_encoding('UTF-8')

  end
end
imap.logout()
imap.disconnect()

示例正文内容:

1:

LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS08YnI+DQpPcmRy
ZWRhdG86IDI4LTAzLTIwMTMgMTQ6NDc6MTg8YnI+DQpPcmRyZW51bW1lcjogMTA5MDM1PGJy
Pg0KVHJhbnNha3Rpb25zSUQ6IDE2NzgyMQ0KPGJyPjxicj4NCkZha3R1cmVyaW5nc2FkcmVz
c2U6PGJyPg0KLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLTxicj48YnI+DQpOaWtsYXMgSnV1bCBOaWVs
c2VuPGJyIC8+QS5QLiBNw7hsbGVyIEtvbGxlZ2lldCAxMDU8YnIgLz41NzAwIFN2ZW5kYm9y
ZzxiciAvPkRlbm1hcms8YnIgLz5UTEY6OiAyMDYzMDczNzxiciAvPjxhIGhyZWY9Im1haWx0
bzpuaWtzQGxpdmUuZGsiPm5pa3NAbGl2ZS5kazwvYT48YnIgLz4NCjxicj48YnI+DQpMZXZl
cmluZ3NhZHJlc3NlOjxicj4NCi0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS08YnI+PGJyPg0KTmlrbGFz
IEp1dWwgTmllbHNlbjxiciAvPkEuUC4gTcO4bGxlciBLb2xsZWdpZXQgMTA1PGJyIC8+NTcw
MCBTdmVuZGJvcmc8YnIgLz5EZW5tYXJrPGJyIC8+VExGOjogMjA2MzA3Mzc8YnIgLz48YSBo
cmVmPSJtYWlsdG86bmlrc0BsaXZlLmRrIj5uaWtzQGxpdmUuZGs8L2E+PGJyIC8+DQo8YnI+
PGJyPg0KT3JkcmVkYXRhOjxicj4NCi0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS08YnI+DQoNCiAgMSww
MCBzdGsuIFN0YXIgV2FycyBCYXR0bGVmcm9udCBJSSBYYm94ICg0MTAzMikgw6EgREtLIDI2
Myw5OSAtIElhbHQ6IERLSyAzMjksOTkNCjxicj4NCjxicj4NCkJldGFsaW5nOiAyOiBEYW5z
a2Uga3JlZGl0a29ydCBbdHJhbnNha3Rpb25zZ2VieXIgMSwyNSVdIChES0sgNCwxMykNCjxi
cj4NCkZvcnNlbmRlbHNlOiAgKERLSyAwLDAwKQ0KPGJyPjxicj4NClNhbWxldCBwcmlzIDog
REtLIDMzNCwxMg0KPGJyPg0KSGVyYWYgbW9tczogREtLIDY2LDgzDQo=

2(缩短):

------=_NextPart_000_0482_01CE2B9E.A689A9F0
Content-Type: multipart/related;
    boundary="----=_NextPart_001_0483_01CE2B9E.A689A9F0"

------=_NextPart_001_0483_01CE2B9E.A689A9F0
Content-Type: multipart/alternative;
boundary="----=_NextPart_002_0484_01CE2B9E.A689A9F0"

------=_NextPart_002_0484_01CE2B9E.A689A9F0
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

=20



            =09
    =09
    =09
=09
=20

=09

                    =09
                        =09
                        =09
                        =09
                        =09
                        =09
                        =09
                        =09

Daily Restock Information.

=09
=09

Item

Format

1+=20

 5+ =20

 Box Price=20

Qty

Barcode

=09

3(缩短):

--Boundary-=_SHccxHuUYYhTGDGLfcIEBDUToEun
Content-Type: text/plain; charset="ISO-8859-1"


--Boundary-=_SHccxHuUYYhTGDGLfcIEBDUToEun
Content-Type: application/octet-stream
Content-Disposition: attachment; filename="SYSTEMSTOCK.XLSX"
Content-Transfer-Encoding: base64

UEsDBBQABgAIAAAAIQC5OlcVkgEAAIwGAAATAN0BW0NvbnRlbnRfVHlwZXNdLnhtbCCi2QEooAAC
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAMRVyWrDMBC9F/oPRtcSK0mhlBInhy7HNpD0AxRrEovYktBMtr/v2FloimtI
HejF+7xl9EYejLZFHq0hoHE2Eb24KyKwqdPGLhLxOX3rPIoISVmtcmchETtAMRre3gymOw8YcbXF
RGRE/klKTDMoFMbOg+U3cxcKRXwbFtKrdKkWIPvd7oNMnSWw1KESQwwHLzBXq5yi1y0/3iuZGSui
5/13JVUilPe5SRWxULm2+gdJx83nJgXt0lXB0DH6AEpjBkBFHvtgmDFMgIiNoZDDwQebDkZDNFaB

等..

最佳答案

为了解决这个问题,我花了很多时间来解决这个问题,因此将我的答案添加到我发现的几个线程中......

https://stackoverflow.com/a/26604049/2386548

希望对某人有所帮助...

关于ruby - 通过 Ruby Net::IMAP 收集邮件时如何最好地编码或清理电子邮件正文,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/16346725/

相关文章:

ruby - "convenience"创建哈希的方法

Ruby:如何干掉相似的模型属性调用

jsp - 字符编码 JSP - 在 JSP 中显示错误但在 URL 中不显示 : "á » á é » é"

php - PHP 不返回值的 MySQL 查询

css - 如何更改 Bootstrap 链接的颜色

Ruby - 使用 pdftk 将多页 pdf 拆分为许多单页 pdf?

r - 如何通过 R 在 outlook body 中显示 excel 工作表

css - 使用特定的 css 定位 Windows Live Mail

java - 从 .eml 文件获取文本的最佳方法是什么?

php - Wordpress 后期编码问题