python - 在 Python 中处理来自 urllib2 和 mechanize 的异常

我是使用异常处理的新手。我正在使用 mechanize 模块来抓取多个网站。我的程序经常失败，因为连接速度慢，而且请求超时。我希望能够在每次尝试之间延迟 30 秒后重试网站(例如超时)最多 5 次。

我看了this stackoverflow 回答并可以看到我如何处理各种异常。我还看到(虽然它看起来很笨拙)如何将 try/exception 放在 while 循环中以控制 5 次尝试......但我不明白如何跳出循环，或者在连接时“继续”成功并且没有抛出异常。

from mechanize import Browser
import time

b = Browser()
tried=0
while tried < 5:
  try:
    r=b.open('http://www.google.com/foobar')
  except (mechanize.HTTPError,mechanize.URLError) as e:
    if isinstance(e,mechanize.HTTPError):
      print e.code
      tried += 1
      sleep(30)
      if tried > 4:
        exit()
    else:
      print e.reason.args
      tried += 1
      sleep(30)
      if tried > 4:
        exit()

print "How can I get to here after the first successful b.open() attempt????"

我将不胜感激有关 (1) 如何在成功打开时跳出循环，以及 (2) 如何使整个 block 不那么笨拙/更优雅的建议。

最佳答案

你的第一个问题可以用break来完成:

while tried < 5:
  try:
    r=b.open('http://www.google.com/foobar')
    break
  except #etc...

然而，真正的问题是您真的想要:这就是所谓的“意大利面条代码”:如果您尝试通过程序绘制执行图，它看起来就像一盘意大利面条。

您遇到的真正(恕我直言)问题是您退出 while 循环的逻辑存在缺陷。不要在多次尝试后尝试停止(这种情况永远不会发生，因为您已经退出了)，而是循环直到您建立连接:

#imports etc

tried=0
connected = False
while not Connected:
    try:
        r = b.open('http://www.google.com/foobar')
        connected = true # if line above fails, this is never executed
    except mechanize.HTTPError as e:
        print e.code            
        tried += 1        
        if tried > 4:
            exit() 
        sleep(30)

    except mechanize.URLError as e:
        print e.reason.args            
        tried += 1
        if tried > 4:
            exit()        
        sleep(30)

 #Do stuff

关于python - 在 Python 中处理来自 urllib2 和 mechanize 的异常，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/15613733/

python - 在 Python 中处理来自 urllib2 和 mechanize 的异常

上一篇：python - python类中方法的动态分配

下一篇：python - 即使执行 IF 语句的 Else 语句为 TRUE