python - 通过 Python 的单个连接进行多个请求

标签 python http https tcp python-requests

当我使用单个请求下载文件时,我会执行以下操作:

session = requests.Session()
params = {'fd': 1, 'count': 1024, 'auth': 'auth_token'}
r = session.get('https://httpbin.org/bytes/9', params=params)

print(r.content)
# b'\xb3_\\l\xe2\xbf/:\x07'

如何在不等待答复的情况下执行多个请求?

服务器 API 文档说:

You can push multiple requests over single connection without waiting for answer, to improve performance. The server will process the requests in the order they are received and you are guaranteed to receive answers in the same order. It is important however to send all requests with "Connection: keep-alive", otherwise the API server will close the connection without processing the pending requests.

他们正在谈论一个线程和多个请求,而不等待答案。我想它被称为 HTTP 管道

如何使用 Python Requests 库执行此操作?

类似的answer建议使用并行调用,但我的问题并非如此。它还说:“requests 会池化连接,保持 TCP 连接打开”。我该如何实现这个?

如果请求无法实现,我可以使用任何其他同步库吗?

最佳答案

您可以并行获取多个页面,无需线程。它通过重置 HTTPSConnection 的状态(私有(private)变量!)来利用 HTTP 管道来欺骗它提前发送下一个请求。

from http.client import HTTPSConnection, _CS_IDLE
from urllib.parse import urlparse, urlunparse


def pipeline(host, pages, max_out_bound=4, debuglevel=0):
    page_count = len(pages)
    conn = HTTPSConnection(host)
    conn.set_debuglevel(debuglevel)
    responses = [None] * page_count
    finished = [False] * page_count
    content = [None] * page_count
    headers = {'Host': host, 'Content-Length': 0, 'Connection': 'Keep-Alive'}

    while not all(finished):
        # Send
        out_bound = 0
        for i, page in enumerate(pages):
            if out_bound >= max_out_bound:
                break
            elif page and not finished[i] and responses[i] is None:
                if debuglevel > 0:
                    print('Sending request for %r...' % (page,))
                conn._HTTPConnection__state = _CS_IDLE  # private variable!
                conn.request("GET", page, None, headers)
                responses[i] = conn.response_class(conn.sock, method=conn._method)
                out_bound += 1
        # Try to read a response
        for i, resp in enumerate(responses):
            if resp is None:
                continue
            if debuglevel > 0:
                print('Retrieving %r...' % (pages[i],))
            out_bound -= 1
            skip_read = False
            resp.begin()
            if debuglevel > 0:
                print('    %d %s' % (resp.status, resp.reason))
            if 200 <= resp.status < 300:
                # Ok
                content[i] = resp.read()
                cookie = resp.getheader('Set-Cookie')
                if cookie is not None:
                    headers['Cookie'] = cookie
                skip_read = True
                finished[i] = True
                responses[i] = None
            elif 300 <= resp.status < 400:
                # Redirect
                loc = resp.getheader('Location')
                responses[i] = None
                parsed = loc and urlparse(loc)
                if not parsed:
                    # Missing or empty location header
                    content[i] = (resp.status, resp.reason)
                    finished[i] = True
                elif parsed.netloc != '' and parsed.netloc != host:
                    # Redirect to another host
                    content[i] = (resp.status, resp.reason, loc)
                    finished[i] = True
                else:
                    path = urlunparse(parsed._replace(scheme='', netloc='', fragment=''))
                    if debuglevel > 0:
                        print('  Updated %r to %r' % (pages[i], path))
                    pages[i] = path
            elif resp.status >= 400:
                # Failed
                content[i] = (resp.status, resp.reason)
                finished[i] = True
                responses[i] = None
            if resp.will_close:
                # Connection (will be) closed, need to resend
                conn.close()
                if debuglevel > 0:
                    print('  Connection closed')
                for j, f in enumerate(finished):
                    if not f and responses[j] is not None:
                        if debuglevel > 0:
                            print('  Discarding out-bound request for %r' % (pages[j],))
                        responses[j] = None
                break
            elif not skip_read:
                resp.read()  # read any data
            if any(not f and responses[j] is None for j, f in enumerate(finished)):
                # Send another pending request
                break
        else:
            break  # All responses are None?
    return content


if __name__ == '__main__':
    domain = 'en.wikipedia.org'
    pages = ['/wiki/HTTP_pipelining', '/wiki/HTTP', '/wiki/HTTP_persistent_connection']
    data = pipeline(domain, pages, max_out_bound=3, debuglevel=1)
    for i, page in enumerate(data):
        print()
        print('==== Page %r ====' % (pages[i],))
        print(page[:512])

关于python - 通过 Python 的单个连接进行多个请求,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/63088301/

相关文章:

用于数据保存的 Node.js 压缩代理

permissions - 内容脚本可以在 https 页面上运行 chrome 扩展开发

python - 为什么使用 .loc[] 访问 pandas 数据帧的列会产生重复的行?

python - Django 多对多

Ruby rest-client 文件上传为具有基本身份验证的多部分表单数据

c - 收到HTTP请求时如何获取客户端的名称?

python - DateField 'str' 对象没有属性 'year'

python - 如何矢量化 MIDI 输入?

php - 如何处理通过 POST 发送的 XML?

amazon-web-services - AWS Cert Mgr - 如何创建根 CA 证书?