python - 在scrapy中并行运行1个网站的多个蜘蛛？

标签 python web-scraping web-crawler scrapy

我想抓取一个包含 2 个部分的网站，但我的脚本没有我需要的那么快。

是否可以启动 2 个蜘蛛，一个用于抓取第一部分，第二个用于抓取第二部分？

我尝试有 2 个不同的类，并运行它们

scrapy crawl firstSpider
scrapy crawl secondSpider

但我认为这并不聪明。

我读了documentation of scrapyd但我不知道这是否适合我的情况。

最佳答案

我认为您正在寻找的是这样的:

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider1(scrapy.Spider):
    # Your first spider definition
    ...

class MySpider2(scrapy.Spider):
    # Your second spider definition
    ...

process = CrawlerProcess()
process.crawl(MySpider1)
process.crawl(MySpider2)
process.start() # the script will block here until all crawling jobs are finished

您可以阅读更多内容:running-multiple-spiders-in-the-same-process .

关于python - 在scrapy中并行运行1个网站的多个蜘蛛？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/39365131/

上一篇：python - 获取根据修改日期排序的文件夹列表

下一篇：sql - 如何将 IN 运算符与 LIKE 条件结合起来(或获得可比较结果的最佳方法)

相关文章：

python - 在 Python 中捕获输出音频频谱

python - 使用pyinstaller打包后，python中的程序无法运行

python - beautifulsoup python 如何循环遍历表格中的单元格并查找<a>链接</a>

python - 使用 lxml 抓取动态 html 字段

python - 为什么连接被拒绝？

linux - 网络统计 : How to know if it's Human or Bot/Spider/DDOS

php - Hadoop:仅读取 “English”页面

python - SQLAlchemy 双向删除级联(没有 ORM)

javascript - Rselenium 无法点击所有单选按钮(仅限其中一些)

java - 增加爬虫的线程数