Python:尝试通过电子邮件发送 href

标签 python beautifulsoup scrape

下面的代码从 ESPN/college-football 中提取头条新闻。我可以捕获文章的标题和链接。我可以打印这两份文件,但我也想通过电子邮件发送它们。我可以获取电子邮件的标题,但不能获取 href 属性。知道发生了什么吗?

from urllib import urlopen
from BeautifulSoup import BeautifulSoup
import smtplib

# Copy all of the content from the provided web page
webpage = urlopen('http://espn.go.com/college-football').read()
soup = BeautifulSoup(webpage)    

# to get the contents of <ul> tags w/ attribute class="headlines":
for i in soup.findAll('ul', {'class': 'headlines'}):
    for tag in i.findAll('li'):
        for a in tag.findAll({'a' : True, 'title' : False}):            
            print a.text
            print a['href']                                
            print "\n"

            url = str(a.get('href'))
            fromaddr = 'FROM@from.com'
            toaddrs  = 'TO@to.com'            


            # Credentials (if needed)
            username = 'username'
            password = 'password'

            # The actual mail send
            server = smtplib.SMTP('smtp.gmail.com', 587)
            server.set_debuglevel(1)
            server.ehlo()
            server.starttls()
            server.login(username,password)
            server.sendmail(fromaddr, toaddrs, url)
            server.quit()

Eclipse 中的控制台显示:

reply: '235 2.7.0 Accepted\r\n'
reply: retcode (235); Msg: 2.7.0 Accepted
send: 'mail FROM:<person> size=106\r\n'
reply: '250 2.1.0 OK a9sm22683966anb.6\r\n'
reply: retcode (250); Msg: 2.1.0 OK a9sm22683966anb.6
send: 'rcpt TO:<emailHere>\r\n'
reply: '250 2.1.5 OK a9sm22683966anb.6\r\n'
reply: retcode (250); Msg: 2.1.5 OK a9sm22683966anb.6
send: 'data\r\n'
reply: '354  Go ahead a9sm22683966anb.6\r\n'
reply: retcode (354); Msg: Go ahead a9sm22683966anb.6
data: (354, 'Go ahead a9sm22683966anb.6')
send: 'http://espn.go.com/college-sports/story/_/id/8878732/lawyer-ncaa-miami-    hurricanes-investigation-says-patsy\r\n.\r\n'
reply: '250 2.0.0 OK 1359087354 a9sm22683966anb.6\r\n'
reply: retcode (250); Msg: 2.0.0 OK 1359087354 a9sm22683966anb.6
data: (250, '2.0.0 OK 1359087354 a9sm22683966anb.6')
send: 'quit\r\n'
reply: '221 2.0.0 closing connection a9sm22683966anb.6\r\n'
reply: retcode (221); Msg: 2.0.0 closing connection a9sm22683966anb.6

但它从未在电子邮件中出现。

最佳答案

尝试 email 包来格式化电子邮件:

# -*- coding: utf-8 -*-
from email.header    import Header
from email.mime.text import MIMEText

msg = MIMEText('put message body here…', 'plain', 'utf-8')
msg['Subject'] = Header('here goes subject…', 'utf-8')
msg['From'] = 'from@gmail.com'
msg['To'] = 'to@example.com'
print(msg.as_string())

输出:

Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Subject: =?utf-8?q?here_goes_subject=E2=80=A6?=
From: from@gmail.com
To: to@example.com

cHV0IG1lc3NhZ2UgYm9keSBoZXJl4oCm

To send it via gmail:

from smtplib import SMTP_SSL

s = SMTP_SSL('smtp.gmail.com')
s.set_debuglevel(1)
try:
    s.login(login, password)
    s.sendmail(msg['From'], msg['To'], msg.as_string())
finally:
    s.quit()

关于Python:尝试通过电子邮件发送 href,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/14515294/

相关文章:

python - 索引 '[0]' 未提取正确的值?

ruby - 使用 Nokogiri::HTML 和 Ruby 进行网页抓取 - 如何将输出输出到数组中?

python - 在 shell 脚本中的 shell 脚本中从 Python 脚本创建文件

c# - 您可以将 Python 嵌入到 C 程序中,然后在不使用外部工具的情况下从 C#/ASP.NET 调用它吗?

python - 编写领域特定语言以从表中选择行

python - BeautifulSoup4 抓取无法超出网站的首页(Python 3.6)

python - 在 Pandas 中循环 MAPE 函数会抛出错误

javascript - 抓取——使用 PyQt4 从 JS 生成的页面中缺少 <dt> 标记的文本元素

python - 场景跨度,在 DIV 内使用 Python

iphone - 尝试使用 NSLog() 时出现意外错误