python - 发送请求时如何从 ThreadPoolExecutor 获取辅助列表项?

标签 python python-multithreading threadpoolexecutor

使用 ThreadPoolExecutor 上的 python 文档有这个请求函数:

import concurrent.futures
import urllib.request

URLS = ['http://www.foxnews.com/',
        'http://www.cnn.com/',
        'http://europe.wsj.com/',
        'http://www.bbc.co.uk/',
        'http://some-made-up-domain.com/']

# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
    with urllib.request.urlopen(url, timeout=timeout) as conn:
        return conn.read()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r page is %d bytes' % (url, len(data)))

如果 URL 列表被这样调整:

URLS = [['http://www.foxnews.com/','American'],
        ['http://www.cnn.com/','American'],
        ['http://europe.wsj.com/', 'European'],
        ['http://www.bbc.co.uk/', 'Eurpoean']
        ['http://some-made-up-domain.com/','Unknown']]

您可以通过索引列表轻松提取 URL:

future_to_url = {executor.submit(load_url, url, 60): url[0] for url in URLS}

我正在努力解决的是如何从该列表(索引 1)中提取区域以包含在 as_completed 结果中,因此打印结果如下:

print('%r %r page is %d bytes' % (region, url, len(data))

最佳答案

您可以将 URLS 列表转换为字典 (url_region_mapper),将 url 与其区域进行映射,这样您就可以根据给定的信息了解它所在的区域网址。

import concurrent.futures
import urllib.request

URLS = [['http://www.foxnews.com/','American'],
        ['http://www.cnn.com/','American'],
        ['http://europe.wsj.com/', 'European'],
        ['http://www.bbc.co.uk/', 'Eurpoean'],
        ['http://some-made-up-domain.com/','Unknown']]

url_region_mapper = dict(URLS)

# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
    with urllib.request.urlopen(url, timeout=timeout) as conn:
        return conn.read()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url[0], 60): url[0] for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r %r page is %d bytes' % (url_region_mapper[url], url, len(data)))

如果存在映射到不同区域的重复 URL,您可以将 URL 和区域作为列表而不是 URL 字符串包含到 future_to_url 字典中。

future_to_url = {executor.submit(load_url, url[0], 60): [url[0], url[1]] for url in URLS}`)
import concurrent.futures
import urllib.request

URLS = [['http://www.foxnews.com/','American'],
        ['http://www.cnn.com/','American'],
        ['http://europe.wsj.com/', 'European'],
        ['http://www.bbc.co.uk/', 'Eurpoean'],
        ['http://some-made-up-domain.com/','Unknown']]

# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
    with urllib.request.urlopen(url, timeout=timeout) as conn:
        return conn.read()

# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
    # Start the load operations and mark each future with its URL
    future_to_url = {executor.submit(load_url, url[0], 60): [url[0], url[1]] for url in URLS}
    for future in concurrent.futures.as_completed(future_to_url):
        url = future_to_url[future][0]
        region = future_to_url[future][1]
        try:
            data = future.result()
        except Exception as exc:
            print('%r generated an exception: %s' % (url, exc))
        else:
            print('%r %r page is %d bytes' % (region, url, len(data)))

关于python - 发送请求时如何从 ThreadPoolExecutor 获取辅助列表项?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/59821508/

相关文章:

python - 实现双边过滤器

python - 更新传递给线程python的变量

python - 如何在 Python 中的单独线程中启动 win32 应用程序

python - Python线程自调用线程意外行为

java - Java Spring 中受控的 ThreadPoolExecutor

python - 为什么只有一个工作线程的 ThreadPoolExecutor 仍然比正常执行速度更快?

python - 修改 Pandas 数据框以列出年月日

python - 优化这个 Django 代码?

python - 递归函数不会返回预期的对象

java - 使用 ExecutorService 的并发