python - 爬虫对象与蜘蛛、管道对象有什么关系？

我正在使用 scrapy。我有一个以以下开头的管道:

class DynamicSQLlitePipeline(object):

    @classmethod
    def from_crawler(cls, crawler):
        # Here, you get whatever value was passed through the "table" parameter
        table = getattr(crawler.spider, "table")
        return cls(table)

    def __init__(self,table):
        try:
            db_path = "sqlite:///"+settings.SETTINGS_PATH+"\\data.db"
            db = dataset.connect(db_path)
            table_name = table[0:3]  # FIRST 3 LETTERS
            self.my_table = db[table_name]

我一直在阅读https://doc.scrapy.org/en/latest/topics/api.html#crawler-api ，其中包含:

The main entry point to Scrapy API is the Crawler object, passed to extensions through the from_crawler class method. This object provides access to all Scrapy core components, and it’s the only way for extensions to access them and hook their functionality into Scrapy.

但还是不明白from_crawler方法，以及爬虫对象。爬虫对象与蜘蛛、管道对象有什么关系？爬虫如何以及何时实例化？蜘蛛是爬虫的子类吗？我问过Passing scrapy instance (not class) attribute to pipeline ，但我不明白这些部分是如何组合在一起的。

最佳答案

Crawler实际上是Scrapy架构中最重要的对象之一。它是爬行执行逻辑的核心部分，将许多其他部分“粘合”在一起:

The main entry point to Scrapy API is the Crawler object, passed to extensions through the from_crawler class method. This object provides access to all Scrapy core components, and it’s the only way for extensions to access them and hook their functionality into Scrapy.

一个或多个爬虫由CrawlerRunner或CrawlerProcess实例控制。

现在，许多 Scrapy 组件上可用的 from_crawler 方法只是这些组件访问正在运行该特定组件的 crawler 实例的一种方式。

此外，请查看 Crawler, CrawlerRunner and CrawlerProcess actual implementations .

而且，我个人发现，为了更好地理解 Scrapy 内部的工作原理，从脚本运行蜘蛛很有帮助 - check out these detailed step-by-step instructions .

关于python - 爬虫对象与蜘蛛、管道对象有什么关系？，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/47972671/

python - 爬虫对象与蜘蛛、管道对象有什么关系？

上一篇：python - 仅使用一级索引设置多索引数据帧的值

下一篇：python - 使用 pandas 执行分组聚合和排序的更好方法