python - 从 Scrapy 的 RetryMiddleware 类继承时如何修复循环导入?

标签 python scrapy

我正在尝试改编 Scrapy 的 RetryMiddleware类,用复制粘贴的版本覆盖 _retry 方法,我只在其中添加了一行。我尝试按如下方式启动我的自定义中间件模块:

import scrapy.downloadermiddlewares.retry
from scrapy.utils.python import global_object_name

然而,这会产生一个

ImportError: cannot import name global_object_name

根据 ImportError: Cannot import name X ,这种错误是由循环导入引起的,但是在这种情况下我不能轻易地去除Scrapy源代码中的依赖。我该如何解决这个问题?

为了完整起见,这里是我正在尝试实现的 TorRetryMiddleware:

import logging
import scrapy.downloadermiddlewares.retry
from scrapy.utils.python import global_object_name
import apkmirror_scraper.tor_controller as tor_controller

logger = logging.getLogger(__name__)

class TorRetryMiddleware(scrapy.downloadermiddlewares.retry.RetryMiddleware):
    def __init__(self, settings):
        super(TorRetryMiddleware, self).__init__(settings)
        self.retry_http_codes = {403, 429}                  # Retry on 403 ('Forbidden') and 429 ('Too Many Requests')

    def _retry(self, request, reason, spider):
        '''Same as original '_retry' method, but with a call to 'change_identity' before returning the Request.'''
        retries = request.meta.get('retry_times', 0) + 1

        stats = spider.crawler.stats
        if retries <= self.max_retry_times:
            logger.debug("Retrying %(request)s (failed %(retries)d times): %(reason)s",
                         {'request': request, 'retries': retries, 'reason': reason},
                         extra={'spider': spider})
            retryreq = request.copy()
            retryreq.meta['retry_times'] = retries
            retryreq.dont_filter = True
            retryreq.priority = request.priority + self.priority_adjust

            if isinstance(reason, Exception):
                reason = global_object_name(reason.__class__)

            stats.inc_value('retry/count')
            stats.inc_value('retry/reason_count/%s' % reason)

            tor_controller.change_identity()    # This line is added to the original '_retry' method      

            return retryreq
        else:
            stats.inc_value('retry/max_reached')
            logger.debug("Gave up retrying %(request)s (failed %(retries)d times): %(reason)s",
                         {'request': request, 'retries': retries, 'reason': reason},
                         extra={'spider': spider})

最佳答案

我个人认为这个ImportError 不是来自循环导入。相反,您的 Scrapy 版本很可能还不包含 scrapy.utils.python.global_object_name

scrapy.utils.python.global_object_name 直到 this commit 才出现,它还不属于任何现有版本(最新版本是 v1.3.3)(不过它的目标版本是 v1.4)。

请确认您正在使用来自 GitHub 的代码,并且您的代码确实包含该提交。

已编辑:

关于:

According to ImportError: Cannot import name X, this type of error is caused by circular imports,

有很多原因可能导致ImportError。通常堆栈跟踪足以确定根本原因。例如

>>> import no_such_name
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named no_such_name

虽然循环导入应该有完全不同的堆栈跟踪,例如

[pengyu@GLaDOS-Precision-7510 tmp]$ cat foo.py 
from bar import baz
baz = 1
[pengyu@GLaDOS-Precision-7510 tmp]$ cat bar.py 
from foo import baz
baz = 2
[pengyu@GLaDOS-Precision-7510 tmp]$ python -c "import foo"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/foo.py", line 1, in <module>
    from bar import baz
  File "/tmp/bar.py", line 1, in <module>
    from foo import baz
ImportError: cannot import name 'baz'

关于python - 从 Scrapy 的 RetryMiddleware 类继承时如何修复循环导入?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/43977262/

相关文章:

python - 从 softpedia.com 获取 Scrapy 下载安装程序

python - 在Scrapy中创建可编辑的CrawlSpider规则

python - 使用 pandas 获取开始日期和结束日期之间的工作日

python - 导入错误位于/admin/

python - Pandas 数据帧日期时间转换和最小/最大计算

scrapy - 广泛的 Scrapy 爬行 : sgmlLinkextractor rule does not work

python - 抓取时获取错误实例方法没有属性 '__getitem__'

python - 不确定用 Scrapy 迭代什么

Python:按属性合并两个不同对象的列表

python - 属性错误 : 'module' object has no attribute '[x]'