python - 请帮助我通过此错误消息来破解吗? - python

标签 python error-handling web-scraping beautifulsoup

我已经在python中编写了以下代码,该代码转到数组中的url并找到有关该页面的特定信息-各种网络抓取工具。这个接收一组Reddit线程,并输出每个线程的分数。该程序几乎永远不会完全执行。通常,在收到以下错误消息之前,我将经过5次左右的迭代。有人可以帮我深入了解这个问题吗?

import urllib2
from bs4 import BeautifulSoup

urls = ['http://www.reddit.com/r/videos/comments/1i12o2/soap_precursor_to_a_lot_of_other_hilarious_shows/', 'http://www.reddit.com/r/videos/comments/1i12nx/kid_reporter_interviews_ryan_reynolds/', 'http://www.reddit.com/r/videos/comments/1i12ml/just_my_two_boys_going_full_derp_shocking_plot/']

for x in urls:
    f = urllib2.urlopen(x)
    data = f.read()
    soup = BeautifulSoup(data)
    span = soup.find('span', attrs={'class':'number'})
    print '{}:{}'.format(x, span.text)

我收到的错误消息是:
Traceback (most recent call last):
  File "C:/Users/jlazarus/Documents/YouTubeparse2.py", line 7, in <module>
    f = urllib2.urlopen(x)
  File "C:\Python27\lib\urllib2.py", line 127, in urlopen
    return _opener.open(url, data, timeout)
  File "C:\Python27\lib\urllib2.py", line 410, in open
    response = meth(req, response)
  File "C:\Python27\lib\urllib2.py", line 523, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\urllib2.py", line 448, in error
    return self._call_chain(*args)
  File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\urllib2.py", line 531, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 429: Unknown

最佳答案

忽略尝试和除错规则以捕获错误,如果您只是想跳过错误,这就是您想要的。

import urllib2
from bs4 import BeautifulSoup

urls = ['http://www.reddit.com/r/videos/comments/1i12o2/soap_precursor_to_a_lot_of_other_hilarious_shows/', 'http://www.reddit.com/r/videos/comments/1i12nx/kid_reporter_interviews_ryan_reynolds/', 'http://www.reddit.com/r/videos/comments/1i12ml/just_my_two_boys_going_full_derp_shocking_plot/']

for x in urls:
    try:
        f = urllib2.urlopen(x)
        data = f.read()
        soup = BeautifulSoup(data)
        span = soup.find('span', attrs={'class':'number'})
        print '{}:{}'.format(x, span.text)
    except HTTPError:
        print("HTTP Error, continuing")

关于python - 请帮助我通过此错误消息来破解吗? - python ,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/17622753/

相关文章:

javascript - Mongodb 与和尚 : error catching and handling if db is down

android - 处理 PocketSphinx Android 应用程序中的错误

c++ - 集成 Python 和 C++

python - 无法理解为什么当你重复调用同一个函数时 python 不重新使用参数

php - 有什么方法可以使PHP在未设置/ undefined variable 和数组索引等中止?

java - 有没有工具可以隔离网页内容?

ruby - 在 ruby​​ 中更改 IP 地址

r - 如何用rvest过滤掉节点?

python - 从多个进程将数据添加到队列

python - Google Colab csv 文件上传速度慢得可怜