python - 从 Python 运行 Scrapy

我正在尝试从 Python 运行 Scrapy。我正在查看这段代码( source ):

from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy.settings import Settings
from scrapy import log
from testspiders.spiders.followall import FollowAllSpider

spider = FollowAllSpider(domain='scrapinghub.com')
crawler = Crawler(Settings())
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run() # the script will block here

我的问题是我对如何调整此代码以运行我自己的蜘蛛感到困惑。我将我的蜘蛛项目命名为“spider_a”，它指定了在蜘蛛本身内爬行的域。

我想问的是，如果我使用以下代码运行我的蜘蛛:

scrapy crawl spider_a

如何调整上面的示例 python 代码来执行相同的操作？

最佳答案

只需将其导入并传递给 crawler.crawl()，例如:

from testspiders.spiders.spider_a import MySpider

spider = MySpider()
crawler.crawl(spider)

关于python - 从 Python 运行 Scrapy，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/18100310/

上一篇：python - CSR 矩阵中元素的总和

下一篇：python - 将两个高斯组合成另一个高斯

Python - numpy reshape

java - Tomcat自动关闭

python - 结构化数组: Do operations on views result in scattered arrays?

python pandas基于2个键合并数据

r - 使用 R 抓取您自己的 Stack Overflow 配置文件

python - BeautifulSoup ，get_text 但不是 <span> 文本..我怎样才能得到它？

python - 如何访问这个没有 id 的 <div class> 内的文本？使用BeautifulSoup

linux - 使用可预测的网络接口(interface)名称

apache - Varnish :大师冥想