python - 使用 asyncio/aiohttp 获取多个 URL 并重试失败

我正在尝试使用 aiohttp 包编写一些异步 GET 请求，并且已经弄清楚了大部分内容，但我想知道处理失败(作为异常返回)时的标准方法是什么。

到目前为止，我的代码的总体思路(经过反复试验，我采用了 here 方法):

import asyncio
import aiofiles
import aiohttp
from pathlib import Path

with open('urls.txt', 'r') as f:
    urls = [s.rstrip() for s in f.readlines()]

async def fetch(session, url):
    async with session.get(url) as response:
        if response.status != 200:
            response.raise_for_status()
        data = await response.text()
    # (Omitted: some more URL processing goes on here)
    out_path = Path(f'out/')
    if not out_path.is_dir():
        out_path.mkdir()
    fname = url.split("/")[-1]
    async with aiofiles.open(out_path / f'{fname}.html', 'w+') as f:
        await f.write(data)

async def fetch_all(urls, loop):
    async with aiohttp.ClientSession(loop=loop) as session:
        results = await asyncio.gather(*[fetch(session, url) for url in urls],
                return_exceptions=True)
        return results

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    results = loop.run_until_complete(fetch_all(urls, loop))

现在运行正常:

正如预期的那样，results 变量填充了 None 条目，其中相应的 URL [即在 urls 数组变量中的相同索引处，即在输入文件 urls.txt] 中的相同行号已成功请求，并将相应的文件写入磁盘.
这意味着我可以使用结果变量来确定哪些 URL 不成功(results 中的条目不等于 None)

我看过一些使用各种异步 Python 包(aiohttp、aiofiles 和 asyncio)的不同指南，但我还没有'看到处理这最后一步的标准方法。

是否应在 await 语句“完成”/“完成”后重试发送 GET 请求？
...或者重试发送 GET 请求是否应该由失败时的某种回调发起
- 错误如下所示:(ClientConnectorError(111, "Connect call failed ('000.XXX.XXX.XXX', 443)") 即对 IP 地址 000 的请求.XXX.XXX.XXX 在 443 端口失败，可能是因为服务器有一些限制，我应该在重试之前等待超时来尊重它。
我是否可以考虑施加某种限制，以批处理请求数量而不是尝试所有请求？
尝试列表中的几百个(超过 500 个)URL 时，我收到了大约 40-60 个成功请求。

天真地，我期待run_until_complete以在成功请求所有 URL 后结束的方式处理此问题，但事实并非如此。

我以前没有使用过异步 Python 和 session /循环，所以如果能帮助我找到如何获得结果，我将不胜感激。如果我能提供更多信息来改进这个问题，请告诉我，谢谢!

最佳答案

Should the retrying to send a GET request be done after the await statement has 'finished'/'completed'? ...or should the retrying to send a GET request be initiated by some sort of callback upon failure

你可以做前者。您不需要任何特殊的回调，因为您是在协程内部执行的，所以一个简单的 while 循环就足够了，并且不会干扰其他协程的执行。例如:

async def fetch(session, url):
    data = None
    while data is None:
        try:
            async with session.get(url) as response:
                response.raise_for_status()
                data = await response.text()
        except aiohttp.ClientError:
            # sleep a little and try again
            await asyncio.sleep(1)
    # (Omitted: some more URL processing goes on here)
    out_path = Path(f'out/')
    if not out_path.is_dir():
        out_path.mkdir()
    fname = url.split("/")[-1]
    async with aiofiles.open(out_path / f'{fname}.html', 'w+') as f:
        await f.write(data)

Naively, I was expecting run_until_complete to handle this in such a way that it would finish upon succeeding at requesting all URLs

术语“完成”在协程完成(运行其进程)的技术意义上是指协程，这是通过协程返回或引发异常来实现的。

关于python - 使用 asyncio/aiohttp 获取多个 URL 并重试失败，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/56840527/

python - 使用 asyncio/aiohttp 获取多个 URL 并重试失败

上一篇：python - 是否可以在 Python 中将 RabbitMQ 直接回复功能与 Pika 生成器使用者一起使用？

下一篇：python - Numba jit : "Typing error" and "All templates rejected with/without literals" 问题