python - 使用Beautifulsoup的python多线程

标签 python multithreading beautifulsoup

这是读取网址链接并将其转换为Beautifulsoup的功能

multithreadding=[]
    def scraper_worker(url):
        r=requests.get(url)
        soup = BeautifulSoup(r.text,"html.parser")
        data=soup.find("div",{"class":"main-container"})
        multithreadding.append(data) 

threadding=[]
 for u in split_link:
     t=Thread(target=scraper_worker,args=(u, ))
     t.start()
     threadding.append(t)

split_link是存储50个奇数链接的列表。我在运行多线程部分时遇到问题

最佳答案

这是如何使用queue将结果从线程发送到主线程的示例。

import requests
from bs4 import BeautifulSoup
from threading import Thread
import queue

# --- functions ---

def worker(url, queue): # get queue as argument
    r = requests.get(url)

    soup = BeautifulSoup(r.text, "html.parser")
    data = soup.find("span", {"class": "text"}).get_text()

    # send result to main thread using queue
    queue.put(data)

# --- main ---

all_links = [
    'http://quotes.toscrape.com/page/' + str(i) for i in range(1, 11)
]

all_threads = []
all_results = []
my_queue = queue.Queue()

# run threads
for url in all_links:
    t = Thread(target=worker, args=(url, my_queue))
    t.start()
    all_threads.append(t)

# get results from queue    
while len(all_results) < len(all_links):
    # get result from queue
    data = my_queue.get()
    all_results.append(data)

    # or with queue.empty if loop has to do something more
    # because queue.get() wait for data if queue is empty and blocks loop

    #if not my_queue.empty():
    #    data = my_queue.get()
    #    all_results.append(data)

# display results        
for item in all_results:        
    print(item[:50], '...')        

关于python - 使用Beautifulsoup的python多线程,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/48005396/

相关文章:

python - 我如何使用 python 将输出打印到 html 页面?

python - 在 python 中使用 pandas 将 csv 文件附加到一个

python - 用 BeautifulSoup 解析表并写入文本文件

python - 打包密码软件并分发

python - 多元线性回归和列误差选择

multithreading - 信号量值

c++ - 生产者-消费者模型

python-3.x - 如何在 beautifulsoup 的多个列表中获取特定元素?

python - 抓取网站并仅将可见文本导出到文本文档 Python 3 (Beautiful Soup)