python - Tornado AsyncHTTPClient.fetch 异常

我正在使用tornado.httpclient.AsyncHTTPClient.fetch从列表中获取域。当我以一些大的间隔(例如 500)来获取域时，一切都很好，但是当我将间隔减少到 100 时，下一个异常会不时发生:


Traceback (most recent call last):
  File "/home/crchemist/python-2.7.2/lib/python2.7/site-packages/tornado/simple_httpclient.py", line 289, in cleanup
    yield
  File "/home/crchemist/python-2.7.2/lib/python2.7/site-packages/tornado/stack_context.py", line 183, in wrapped
    callback(*args, **kwargs)
  File "/home/crchemist/python-2.7.2/lib/python2.7/site-packages/tornado/simple_httpclient.py", line 384, in _on_chunk_length
    self._on_chunk_data)
  File "/home/crchemist/python-2.7.2/lib/python2.7/site-packages/tornado/iostream.py", line 180, in read_bytes
    self._check_closed()
  File "/home/crchemist/python-2.7.2/lib/python2.7/site-packages/tornado/iostream.py", line 504, in _check_closed
    raise IOError("Stream is closed")
IOError: Stream is closed

造成这种行为的原因是什么？代码如下所示:


def fetch_domain(domain):
    http_client = AsyncHTTPClient()
    request = HTTPRequest('http://' + domain,
       user_agent=CRAWLER_USER_AGENT)
    http_client.fetch(request, handle_domain)


class DomainFetcher(object):
    def __init__(self, domains_iterator):
        self.domains = domains_iterator

    def __call__(self):
        try:
            domain = next(self.domains)
        except StopIteration:
            domain_generator.stop()
            ioloop.IOLoop.instance().stop()
        else:
            fetch_domain(domain)

domain_generator = ioloop.PeriodicCallback(DomainFetcher(domains), 500)
domain_generator.start()

最佳答案

请注意tornado.ioloop.PeriodicCallback takes a cycle time in integer ms而HTTPRequest对象配置了 connect_timeout 和/或 request_timeout float 秒 ( see doc )。

“浏览互联网的用户认为，当从点击到响应的延迟小于 100 毫秒时，响应是“即时”的”( from wikipedia ) 请参阅 this ServerFault question for normal latency values .

IOError: Stream is close 被有效地引发，以通知您连接在未完成的情况下超时，或者更准确地说，您在尚未打开的管道上手动调用了回调。这很好，因为延迟 > 100ms 并不异常；如果您希望提取能够可靠地完成，则应该提高此值。

将超时设置为合理的值后，请考虑将提取包装在 try/except 重试循环中，因为这是您可能会在生产中发生的正常异常。。只是要小心设置重试限制!

<小时/>

既然您使用的是异步框架，为什么不让它自己处理异步回调，而不是按固定时间间隔运行所述回调？ Epoll/kqueue are efficient and supported by this framework.

import ioloop

def handle_request(response):
    if response.error:
        print "Error:", response.error
    else:
        print response.body
    ioloop.IOLoop.instance().stop()

http_client = httpclient.AsyncHTTPClient()
http_client.fetch("http://www.google.com/", handle_request)
ioloop.IOLoop.instance().start()

^ 逐字复制 from the doc .

如果您选择这条路线，唯一的问题是对请求队列进行编码，以便强制执行最大打开连接数。否则，在进行严重的抓取时，您可能会遇到竞争状况。

自从我自己接触 Tornado 以来已经大约一年了，所以如果此回复中有不准确的地方，请告诉我，我会修改。

关于python - Tornado AsyncHTTPClient.fetch 异常，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/7703480/

python - Tornado AsyncHTTPClient.fetch 异常

上一篇：python - 遵循 django 教程时出错

下一篇：python - 响应后记录请求已被响应